Abstract
The growing capabilities of synthetic biology and organic chemistry demand tools to guide syntheses towards useful molecules. Here, we present MACAW (Molecular AutoenCoding Auto-Workaround), a tool that uses a novel approach to generate molecules predicted to meet a desired property specification (e.g. a binding affinity of 50 nM or an octane number of 90). MACAW describes molecules by embedding them into a smooth multidimensional numerical space, avoiding uninformative dimensions that previous methods often introduce. The coordinates in this embedding provide a natural choice of features for accurately predicting molecular properties, which we demonstrate with examples for cetane and octane numbers, flash points, and histamine H1 receptor binding affinity. The approach is computationally efficient and well-suited to the small- and medium-size datasets commonly used in the biosciences. We showcase the utility of MACAW for virtual screening by identifying molecules with high predicted binding affinity to the histamine H1 receptor and limited affinity to the muscarinic M2 receptor, which are targets of medicinal relevance. Combining these predictive capabilities with a novel generative algorithm for molecules allows us to recommend molecules with a desired property value (i.e. inverse molecular design). We demonstrate this capability by recommending molecules with predicted octane numbers of 40, 80, and 120, which is an important characteristic for biofuels. Thus, MACAW augments classical retrosynthesis tools by providing recommendations for molecules on specification.
Supplementary materials
Title
Supporting Information for MACAW
Description
Supporting figures
Actions
Supplementary weblinks
Title
Repository for MACAW software
Description
Link to the MACAW software and instructions
Actions
View