Multi-fidelity machine learning models for improved high-throughput screening predictions

David Buterez; Jon Paul Janet; Steven Kiddle; Pietro Liò

doi:10.26434/chemrxiv-2022-dsbm5-v2

Theoretical and Computational Chemistry

Search within Theoretical and Computational Chemistry

Multi-fidelity machine learning models for improved high-throughput screening predictions

26 July 2022, Version 2

This is not the most recent version. There is a

newer version

of this content available

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

High throughput screening (HTS) is one of the leading techniques for hit identification in drug discovery and comprises of multiple phases, one primary and one or more confirmatory screens which result in multi-fidelity data. Noisy primary screening data are available on a large number of compounds and higher quality confirmatory data on a low-to-moderate number of compounds. Existing computational pipelines do not integrate primary screening data of individual HTS campaigns, resulting in millions of potentially useful data points being unused for bioactivity prediction. Furthermore, there is a lack of publicly available multi-fidelity bioactivity benchmarks to support modelling real-world HTS data. To address these challenges, we assembled public (PubChem) and private (AstraZeneca) collections of multi-fidelity HTS datasets, totalling over 28 million data points, with many targets possessing more than 1M labels. We then designed and evaluated machine learning models to assess the improvements offered by the integration of multi-fidelity data, including classical models and a bespoke, novel deep learning approach based on graph neural networks. Jointly modelling primary and confirmatory data led to a decrease of 12% in mean absolute error (MAE) and an increase of 152% in R-squared on the public datasets, and a reduction of 17% in MAE coupled with an uplift of 46% in R-squared on the AstraZeneca datasets (averaged across all evaluated methods). We conclude that joint modelling of multi-fidelity HTS data improves predictive performance and that deep learning enables the use of unique and highly desirable strategies such as leveraging signals from multi-million scale datasets and transfer learning.

Keywords

high-throughput screening

concentration response

artificial intelligence

computational

graph neural network

gnn

graph representation learning

support vector machine

Supplementary materials

Title

Description

Actions

Title

Supplementary Information

Description

The materials include additional figures and tables, model hyperparameters, and details about the methodology and evaluation.

Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Version History

Jan 19, 2024 Version 3

Jul 26, 2022 Version 2

May 19, 2022 Version 1

Version Notes

Simplified figures and main text. Some section were moved to the Supplementary Information.

Metrics

4,221

2,021

Views

Downloads

License

The content is available under CC BY 4.0

DOI

10.26434/chemrxiv-2022-dsbm5-v2

Funding

AstraZeneca

Author’s competing interest statement

DB's doctoral studies are funded by AstraZeneca. JPJ and SJK are employed by AstraZeneca and potentially hold shares in the company.

Ethics

The author(s) have declared ethics committee/IRB approval is not relevant to this content

Multi-fidelity machine learning models for improved high-throughput screening predictions

Authors

Abstract

Keywords

Supplementary materials

Comments

Version History

Version Notes

Metrics

License

DOI

Funding

Author’s competing interest statement

Ethics

Share