Abstract
Functional groups are widely used in Organic Chemistry, as they provide a rationale to analyze physicochemical and reactivity properties. In Medicinal Chemistry they are the basis for analyzing ligand-biomacromolecule interactions. Ertl’s algorithm is an approach to extract functional groups in arbitrary organic molecules that do not depend on predefined libraries of functional groups. However, there is a lack of a complete and accurate implementation of Ertl’s algorithm in the widely used RDKit cheminformatic toolkit. In this paper a new RDKit/Python implementation of the algorithm is described, that is both accurate and complete. For a RDKit molecule, it provides: a) a PNG binary string with an image of the molecule with color-highlighted functional groups; b) a list of sets of atom indices (idx), each set corresponding to a functional group; c) a list of pseudo-SMILES canonicalized strings for the full functional groups; d) a list of RDKit labeled mol objects, one for each full functional group. The code is freely available in github.com/bbu-imdea/efgs