Abstract
Evolution of metabolism is a longstanding yet unresolved question, and several hypotheses were proposed to address this complex process from a Darwinian point of view. Modern statistical bioinformatic approaches targeted to the comparative analysis of genomes are being used to detect signatures of natural selection at the gene and population level, as an attempt to understand the origin of primordial metabolism and its expansion. These studies, however, are still mainly centered on genes and the proteins they encode, somehow neglecting the small organic chemicals that support life processes. In this work, we selected steroids as an ancient family of metabolites widely distributed in all eukaryotes and applied unsupervised machine learning techniques to reveal the traits that natural selection has imprinted on molecular properties throughout the evolutionary process. Our results clearly show that sterols, the primal steroids that first appeared, have more conserved properties and that, from then on, more complex compounds with increasingly diverse properties have emerged, suggesting that chemical diversification parallels the expansion of biological complexity. In a wider context, these findings highlight the worth of chemoinformatic approaches to a better understanding the evolution of metabolism.