ChemRxiv
These are preliminary reports that have not been peer-reviewed. They should not be regarded as conclusive, guide clinical practice/health-related behavior, or be reported in news media as established information. For more information, please see our FAQs.
1/1
0/0

A Comparison of Scaling Methods to Obtain Calibrated Probabilities of Activity for Ligand-Target Predictions

preprint
revised on 06.05.2020 and posted on 07.05.2020 by Lewis Mervin, Avid M. Afzal, Ola Engkvist, Andreas Bender
In the context of bioactivity prediction, the question of how to calibrate a score produced by a machine learning method into reliable probability of binding to a protein target is not yet satisfactorily addressed. In this study, we compared the performance of three such methods, namely Platt Scaling, Isotonic Regression and Venn-ABERS in calibrating prediction scores for ligand-target prediction comprising the Naïve Bayes, Support Vector Machines and Random Forest algorithms with bioactivity data available at AstraZeneca (40 million data points (compound-target pairs) across 2112 targets). Performance was assessed using Stratified Shuffle Split (SSS) and Leave 20% of Scaffolds Out (L20SO) validation.

History

Email Address of Submitting Author

lewis.mervin1@astrazeneca.com

Institution

AstraZeneca

Country

United Kingdom

ORCID For Submitting Author

0000-0002-7271-0824

Declaration of Conflict of Interest

None declared

Version Notes

Fixed error in Figure 1 and inaccuracies in the description of the inductive (cross-validated) Platt scaling and Isotonic Regression scaling methods. General improvements to the flow/main body of text

Exports