Abstract
Established molecular machine learning models process individual molecules as inputs to predict their biological, chemical, or physical properties. However, such algorithms require large datasets and have not been optimized to predict property differences between molecules. Many drug and material development tasks would benefit from an algorithm that can directly compare two molecules to guide molecular optimization and prioritization. Here, we develop DeepDelta, a pairwise deep learning approach that processes two molecules simultaneously and learns to predict property differences between two molecules from small datasets. On 10 pharmacokinetic benchmark tasks, our DeepDelta approach outperforms two established molecular machine learning algorithms, the message passing neural network (MPNN) ChemProp and Random Forest using radial fingerprints. We further analyze our performance and find that DeepDelta is particularly outperforming established approaches at predicting large differences in molecular properties and can perform scaffold hopping. Furthermore, we derive simple computational tests of our models based on mathematical invariants and show that compliance to these tests correlate with overall model performance – providing an innovative, unsupervised, and easily computable measure of expected model performance and applicability. Taken together, DeepDelta provides an accurate approach to predict molecular property differences and will allow for higher fidelity and transparency in molecular optimization for drug development and the chemical sciences.
Supplementary materials
Title
Supplementary Figures and Tables
Description
Supplementary figures and tables referenced in the main document.
Actions
Supplementary weblinks
Title
Associated GitHub
Description
We provide python code for evaluating DeepDelta and traditional models based on their ability to predict property differences between two molecules, curated data for 10 ADMET property benchmarking training sets and 2 external test sets, and results from 5x10-fold cross-validation that are utilized in further analysis.
Actions
View