Abstract
Creating a successful small molecule drug is a challenging multi-parameter optimization problem in an effectively infinite space of possible molecules. Generative models have emerged as powerful tools for traversing data manifolds comprised of images, sounds, and text, and offer an opportunity to dramatically improve the drug discovery and design process. To create generative optimization methods that are more useful than brute-force molecular generation and filtering via virtual screening, we propose that four integrated features are necessary: large, quantitative datasets of molecular structure and activity, an invertible vector representation of realistic accessible molecules, smooth and differentiable regressors that quantify uncertainty, and algorithms to simultaneously optimize properties of interest. Over the course of 12 months, Terray has collected a dataset of 2 billion quantitative binding measurements, which directly motivates multi-parameter generative optimization of molecules conditioned on this data. To this end, we present COATI, a pre-trained, multi-modal encoder-decoder model of druglike chemical space. COATI is constructed without any human biasing of features, using contrastive learning from text and 3D representations of molecules to allow downstream use with structural models. We demonstrate that COATI possesses many of the desired properties of a universal molecular embedding: fixed-dimension, invertibility, autoencoding, accurate regression, and low computation cost. Finally, we present a novel metadynamics algorithm for generative optimization using a small subset of our proprietary data collected for a model protein, Carbonic Anhydrase, designing molecules that satisfy the multi-parameter optimization task of potency, solubility, and druglikeness. This work sets the stage for fully-integrated generative molecular design and optimization for small molecules.
Supplementary weblinks
Title
Github location for COATI models
Description
Source code and instructions on how to use COATI models, including examples.
Actions
View