Abstract
The mapping of chemical space by the enumeration of graphs generates an infinite number of molecules, yet the experimental exploration of known chemical space shows that it appears to become sparser as the molecular weight of the compounds increases. What is needed is a way to explore chemical space that exploits the information encoded in known molecules to give access to unknown chemical space by building on the common conserved structures found in related families of molecules. Molecular assembly theory provides an approach to explore and compare the intrinsic complexity of molecules by the minimum number of steps needed to build up the target graphs, and here we show this can be applied to networks of molecules to explore the assembly properties of common motifs, rather than just focusing on molecules in isolation. This means molecular assembly theory can be used to define a tree of assembly spaces, allowing us to explore the accessible molecules connected to the tree, rather than the entire space of possible molecules. This approach provides a way to map the relationship between the molecules and their common fragments and thus measures the distribution of structural information collectively embedded in the molecules. We apply this approach to prebiotic chemistry, specifically the construction of RNA, and a family of opiates and plasticizers, as well as to gene sequences. This analysis allows us to quantify the amount of external information needed to assemble the tree and identify and predict new components in this family of molecules, based on the contingent information in the assembly spaces.