These are preliminary reports that have not been peer-reviewed. They should not be regarded as conclusive, guide clinical practice/health-related behavior, or be reported in news media as established information. For more information, please see our FAQs.
manuscript.pdf (7.35 MB)
Scope of 3D Shape-Based Approaches in Predicting the Macromolecular Targets of Structurally Complex Small Molecules Including Natural Products and Macrocyclic Ligands
Preprints are manuscripts made publicly available before they have been submitted for formal peer review and publication. They might contain new research findings or data. Preprints can be a draft or final version of an author's research but must not have been accepted for publication at the time of submission.
A plethora of similarity-based, network-based, machine learning and docking approaches for predicting the macromolecular targets of small molecules are available today and recognized as valuable tools for providing guidance in early drug discovery. With the increasing maturity of target prediction methods, researchers have started to explore ways to expand their scope to more challenging molecules such as structurally complex natural products and macrocyclic small molecules. In this work we systematically explore the capacity of an alignment-based approach to identify the targets of structurally complex small molecules (including large and flexible natural products and macrocyclic compounds) based on the similarity of their 3D molecular shape to non-complex molecules (i.e. more conventional, "drug-like", synthetic compounds). For this analysis, query sets of ten representative, structurally complex molecules were compiled for each of 35 pharmaceutically relevant proteins. Subsequently, ROCS, a leading shape-based screening engine, was utilized to generate rank-ordered lists of the potential targets of the 35x10 queries according to the similarity of their 3D molecular shape with that of compounds from a knowledge base of 272 640 non-complex small molecules active on a total of 3642 different proteins. Four of the scores implemented in ROCS were explored for target ranking, with the TanimotoCombo score consistently outperforming all others. The score successfully recovered the targets of 29% and 40% of the 350 queries among the top-5 and top-20 positions, respectively. For 29 out of the 35 investigated targets (83%), the method correctly assigned the first rank (out of 3642) to the target of interest for at least one of the ten queries. The shape-based target prediction approach showed remarkable robustness, with good success rates obtained even for compounds that are clearly distinct from any of the ligands present in the knowledge base. However, complex natural products and macrocyclic compounds proved to be challenging even with this approach, although cases of complete failure were recorded only for a small number of targets.