These are preliminary reports that have not been peer-reviewed. They should not be regarded as conclusive, guide clinical practice/health-related behavior, or be reported in news media as established information. For more information, please see our FAQs.
an analysis of proteochemometric and conformal prediction machine learning protein-ligand binding affinity models.pdf (1.14 MB)

An Analysis of Proteochemometric and Conformal Prediction Machine Learning Protein-Ligand Binding Affinity Models

submitted on 28.01.2020, 21:39 and posted on 29.01.2020, 11:42 by conor parks, Zied Gaieb, Rommie Amaro

Protein-ligand binding affinity is a key pharmacodynamic endpoint in drug discovery. Sole reliance on experimental design, make, and test cycles is costly and time consuming, providing an opportunity for computational methods to assist. Herein, we present results comparing random forest and feed-forward neural network proteochemometric models for their ability to predict pIC50 measurements for held out generic Bemis-Murcko scaffolds. In addition, we assess the ability of conformal prediction to provide calibrated prediction intervals in both a retrospective and semi-prospective test using the recently released Grand Challenge 4 data set as an external test set. In total, random forest and deep neural network proteochemometric models show quality retrospective performance but suffer in the semi-prospective setting. However, the conformal predictor prediction intervals prove to be well calibrated both retrospectively and semi-prospectively showing that they can be used to guide hit discovery and lead optimization campaigns.


Email Address of Submitting Author


University of California San Diego



ORCID For Submitting Author


Declaration of Conflict of Interest

REA has equity interest in and is a co- founder and scientific advisor of Actavalon, Inc