Key Molecular Descriptors Distinguishing Between Synthetic and Natural Products

03 October 2023, Version 1
This content is a preprint and has not undergone peer review at the time of posting.


The classification of natural products (NPs) from synthetic molecules (SMs) through machine learning techniques creates knowledge of differentiating features and therefore an impetus for possible research in natural product-based drug design. Natural products generally have a higher chemical diversity and biochemical specificity among other properties, making them favorable as lead structures for drug discovery and differentiating them from synthetic molecules. Here, we propose a machine-learning approach with the PaDEL descriptor software to develop a classification method to differentiate NPs and SMs with a variety of molecular features. An ensemble of supervised learning algorithms, including Logistic Regression, Naive Bayes, Random Forests, and Decision Trees, were tested to obtain the optimal feature importance amongst the molecular descriptors and highest accuracy. The experimental accuracy of the best-performing machine learning method outlined in this paper, Random Forests, reached an 89.19% accuracy, comparable with previous models performing the same classification. Identification and classification of distinguishable properties of natural products and synthetic compounds allows for a better understanding of available chemical data and better incorporation of such properties in small molecule drug discovery.


Molecular descriptors
PaDEL Descriptors

Supplementary materials

Supporting Information: Key Molecular Descriptors Distinguishing Between Synthetic and Natural Products
Supporting information including dataset parameters, raw code, and supplementary figures

Supplementary weblinks


Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.