Abstract
The mathematical search of chemical space can generate an almost infinite number of molecules and it is hard to know which molecules are relevant experimentally. A way to explore the chemical space of known molecules as a function of their relative complexity might help us understand biological processes and find new relationships. Assembly theory provides an approach to explore and compare the intrinsic complexity of molecules by the minimum number of steps needed to build up the target graphs. Here we show assembly theory can be applied to networks of molecules to explore the assembly properties of common motifs and use these to define a tree of assembly spaces. This theory allows us to explore the accessible molecules connected to the tree, rather than the entire space of possible molecules. We apply this approach to prebiotic chemistry, to gene sequences, a family of plasticizers, as well as the well-known opiate class of natural products. This analysis allows us to quantify the amount of external information needed to assemble the tree and identify and predict new components in this family of molecules. Finally, by developing a new reassembly system that uses the disassembly motifs, we found that in the case of the opiates a new set of opiate-like drug candidates could be generated that would not be accessible via conventional fragment-based drug design, thereby demonstrating how this approach might find application in drug discovery.