MACAW: an accessible tool for molecular embedding and inverse molecular design

09 March 2022, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

The growing capabilities of synthetic biology and organic chemistry demand tools to guide syntheses towards useful molecules. Here, we present MACAW (Molecular AutoenCoding Auto-Workaround), a tool that uses a novel approach to generate molecules predicted to meet a desired property specification (e.g. a binding affinity of 50 nM or an octane number of 90). MACAW describes molecules by embedding them into a smooth multidimensional numerical space, avoiding uninformative dimensions that previous methods often introduce. The coordinates in this embedding provide a natural choice of features for accurately predicting molecular properties, which we demonstrate with examples for cetane and octane numbers, flash points, and histamine H1 receptor binding affinity. The approach is computationally efficient and well-suited to the small- and medium-size datasets commonly used in the biosciences. We showcase the utility of MACAW for virtual screening by identifying molecules with high predicted binding affinity to the histamine H1 receptor and limited affinity to the muscarinic M2 receptor, which are targets of medicinal relevance. Combining these predictive capabilities with a novel generative algorithm for molecules allows us to recommend molecules with a desired property value (i.e. inverse molecular design). We demonstrate this capability by recommending molecules with predicted octane numbers of 40, 80, and 120, which is an important characteristic for biofuels. Thus, MACAW augments classical retrosynthesis tools by providing recommendations for molecules on specification.

Keywords

small data
molecular similarity
multidimensional scaling
molecular encoding
molecular generation
directed evolution
inverse design

Supplementary materials

Title
Description
Actions
Title
Supporting Information for MACAW
Description
Supporting figures
Actions

Supplementary weblinks

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.