Abstract
Chemoinformatic tools have been widely used to analyze the properties of large sets of natural
compounds, mostly in the context of drug discovery. Nevertheless, fewer reports have aimed to
answer basic biological questions. In this work, we have applied unsupervised machine learning
techniques to assess the diversity and complexity of a set of natural steroids by characterizing
them through simple topological and physicochemical molecular descriptors. As a most
noteworthy result, these properties, derived from the molecular graphs of the compounds, are
closely related to their biological functions and to their biosynthetic origins. Moreover, a trend
paralleling diversification of the properties and metabolic evolution can be established,
demonstrating the potential contribution of these computational approaches to better
understanding the vast wealth of natural products.