Key Molecular Descriptors Distinguishing Between Synthetic and Natural Products

Nathaniel Thomas; Sweekrit Bhatnagar; Avirral Agarwal; Anya Iyer; Niranjana Sankar; Manav Bhargava; Spencer Ye; Edwin Li; Ritam Nandi; Edward Njoo; Robert Downing

doi:10.26434/chemrxiv-2023-t2lwf

Theoretical and Computational Chemistry

Search within Theoretical and Computational Chemistry

Key Molecular Descriptors Distinguishing Between Synthetic and Natural Products

03 October 2023, Version 1

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

The classification of natural products (NPs) from synthetic molecules (SMs) through machine learning techniques creates knowledge of differentiating features and therefore an impetus for possible research in natural product-based drug design. Natural products generally have a higher chemical diversity and biochemical specificity among other properties, making them favorable as lead structures for drug discovery and differentiating them from synthetic molecules. Here, we propose a machine-learning approach with the PaDEL descriptor software to develop a classification method to differentiate NPs and SMs with a variety of molecular features. An ensemble of supervised learning algorithms, including Logistic Regression, Naive Bayes, Random Forests, and Decision Trees, were tested to obtain the optimal feature importance amongst the molecular descriptors and highest accuracy. The experimental accuracy of the best-performing machine learning method outlined in this paper, Random Forests, reached an 89.19% accuracy, comparable with previous models performing the same classification. Identification and classification of distinguishable properties of natural products and synthetic compounds allows for a better understanding of available chemical data and better incorporation of such properties in small molecule drug discovery.

Keywords

Cheminformatics

Molecular descriptors

PaDEL Descriptors

Supplementary materials

Title

Description

Actions

Title

Supporting Information: Key Molecular Descriptors Distinguishing Between Synthetic and Natural Products

Description

Supporting information including dataset parameters, raw code, and supplementary figures

Actions

Supplementary weblinks

Title

Description

Actions

Title

Github link to raw code

Description

This contains the raw code used for the analyses presented in the paper.

Actions

View

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Version History

Oct 03, 2023 Version 1

Metrics

1,749

441

Views

Downloads

Citations

License

The content is available under CC BY NC 4.0

DOI

10.26434/chemrxiv-2023-t2lwf

Author’s competing interest statement

The author(s) have declared they have no conflict of interest with regard to this content

Ethics

The author(s) declare that they have sought and gained approval from the relevant ethics committee/IRB for this research and its publication.

Key Molecular Descriptors Distinguishing Between Synthetic and Natural Products

Authors

Abstract

Keywords

Supplementary materials

Supplementary weblinks

Comments

Version History

Metrics

License

DOI

Author’s competing interest statement

Ethics

Share