High throughput screening (HTS) is one of the leading techniques for hit identification in drug discovery and comprises of multiple phases, one primary and one or more confirmatory screens which result in multi-fidelity data. Noisy primary screening data are available on a large number of compounds and higher quality confirmatory data on a low-to-moderate number of compounds. Existing computational pipelines do not integrate primary screening data of individual HTS campaigns, resulting in millions of potentially useful data points being unused for bioactivity prediction. Furthermore, there is a lack of publicly available multi-fidelity bioactivity benchmarks to support modelling real-world HTS data. To address these challenges, we assembled public (PubChem) and private (AstraZeneca) collections of multi-fidelity HTS datasets, totalling over 28 million data points, with many targets possessing more than 1M labels. We then designed and evaluated machine learning models to assess the improvements offered by the integration of multi-fidelity data, including classical models and a bespoke, novel deep learning approach based on graph neural networks. Jointly modelling primary and confirmatory data led to a decrease of 12% in mean absolute error (MAE) and an increase of 152% in R-squared on the public datasets, and a reduction of 17% in MAE coupled with an uplift of 46% in R-squared on the AstraZeneca datasets (averaged across all evaluated methods). We conclude that joint modelling of multi-fidelity HTS data improves predictive performance and that deep learning enables the use of unique and highly desirable strategies such as leveraging signals from multi-million scale datasets and transfer learning.
Simplified figures and main text. Some section were moved to the Supplementary Information.