Abstract
Predictive models hold considerable promise in enabling the faster discovery of safe, efficacious therapeutics. To better understand and improve the performance of small molecule predictive models, we conducted multiple experiments with deep learning and traditional machine learning approaches, leveraging our large internal datasets as well as publicly available datasets. These experiments included assessing performance on random, temporal, and reverse-temporal data ablation tasks as well as tasks testing model extrapolation to different property spaces. We were able to identify factors that contribute to higher performance of predictive models built using graph neural networks versus traditional methods such as XGboost and random forest. Expanding upon these learnings, we were able to derive a scaling relationship that accounts for 81% of the variance in model performance across different assays and data regimes. This relationship can be used to estimate the performance of models for ADMET (absorption, distribution, metabolism, excretion, and toxicity) endpoints as well as drug discovery assay data in general. The results provide insights into how to further improve model performance.