These are preliminary reports that have not been peer-reviewed. They should not be regarded as conclusive, guide clinical practice/health-related behavior, or be reported in news media as established information. For more information, please see our FAQs.

A Comparison of Scaling Methods to Obtain Calibrated Probabilities of Activity for Ligand-Target Predictions

revised on 06.05.2020 and posted on 07.05.2020 by Lewis Mervin, Avid M. Afzal, Ola Engkvist, Andreas Bender
In the context of bioactivity prediction, the question of how to calibrate a score produced by a machine learning method into reliable probability of binding to a protein target is not yet satisfactorily addressed. In this study, we compared the performance of three such methods, namely Platt Scaling, Isotonic Regression and Venn-ABERS in calibrating prediction scores for ligand-target prediction comprising the Naïve Bayes, Support Vector Machines and Random Forest algorithms with bioactivity data available at AstraZeneca (40 million data points (compound-target pairs) across 2112 targets). Performance was assessed using Stratified Shuffle Split (SSS) and Leave 20% of Scaffolds Out (L20SO) validation.


Email Address of Submitting Author




United Kingdom

ORCID For Submitting Author


Declaration of Conflict of Interest

None declared

Version Notes

Fixed error in Figure 1 and inaccuracies in the description of the inductive (cross-validated) Platt scaling and Isotonic Regression scaling methods. General improvements to the flow/main body of text