EFGs: A Complete and Accurate Implementation of Ertl’s Functional Group Detection Algorithm in RDKit

11 December 2024, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Functional groups are widely used in Organic Chemistry, as they provide a rationale to analyze physicochemical and reactivity properties. In Medicinal Chemistry they are the basis for analyzing ligand-biomacromolecule interactions. Ertl’s algorithm is an approach to extract functional groups in arbitrary organic molecules that do not depend on predefined libraries of functional groups. However, there is a lack of a complete and accurate implementation of Ertl’s algorithm in the widely used RDKit cheminformatic toolkit. In this paper a new RDKit/Python implementation of the algorithm is described, that is both accurate and complete. For a RDKit molecule, it provides: a) a PNG binary string with an image of the molecule with color-highlighted functional groups; b) a list of sets of atom indices (idx), each set corresponding to a functional group; c) a list of pseudo-SMILES canonicalized strings for the full functional groups; d) a list of RDKit labeled mol objects, one for each full functional group. The code is freely available in github.com/bbu-imdea/efgs

Keywords

Ertl algorithm
functional group
drug discovery
cheminformatics
ligand-receptor interactions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.