Enhanced Sampling of Chemical Space for High Throughput Screening Applications using Machine Learning

Sarvesh Mehta; Siddhartha Laghuvarapu; Yashaswi Pathak; Aaftaab Sethi; Mallika Alvala; U. Deva Priyakumar

doi:10.26434/chemrxiv.14139275.v1

Theoretical and Computational Chemistry

Search within Theoretical and Computational Chemistry

Enhanced Sampling of Chemical Space for High Throughput Screening Applications using Machine Learning

03 March 2021, Version 1

This is not the most recent version. There is a

newer version

of this content available

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

In drug discovery applications, high throughput virtual screening exercises are routinely performed to determine an initial set of candidate molecules referred to as "hits". In such an experiment, each molecule from large small-molecule drug library is evaluated for physical property such as the binding affinity (docking score) against a target receptor. In real-life drug discovery experiments, the drug libraries are extremely large but still a minor representation of the essentially infinite chemical space , and evaluation of physical property for each molecule in the library is not computationally feasible.

In the current study, a novel machine learning framework "MEMES" based on Bayesian optimization is proposed for efficient sampling of chemical space. The proposed framework is demonstrated to identify 90% of top-1000 molecules from a molecular library of size about 100 million, while calculating the docking score only for about 6% of the complete library. We believe that such a framework would tremendously help to reduce the computational hour and resources in not only drug-discovery but also areas that require such high-throughput experiments.

Keywords

Chemical space

Artificial Intelligence

machine Learning

Bayesian optimization

virtual screening

high throughput screening

Supplementary materials

Title

Description

Actions

Title

arch5

Description

Actions

Title

docking hits SI

Description

Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Version History

Jul 15, 2021 Version 2

Mar 03, 2021 Version 1

Metrics

2,965

1,020

Views

Downloads

Citations

License

The content is available under CC BY NC ND 4.0

DOI

10.26434/chemrxiv.14139275.v1

Funding

DST-SERB grant (no. CVD/2020/000343)

Intel Corp. as part of its Pandemic Response Technology Initiative (PRTI)

Author’s competing interest statement

International Institute of Information Technology, Hyderabad has filed provisional patent application for the use of MEMES framework in high-throughput screening exercises, with U.D.P, S.M., S.L., and Y.P. listed as inventors. Provisional Patent Application No.: 202041050608 Status: Awaiting Complete Specification (Provisional Patent Filed) The funders did not have any role in the design, idea, data collection, analysis, interpretation, writing of the manuscript or decision to submit it for publication.

Enhanced Sampling of Chemical Space for High Throughput Screening Applications using Machine Learning

Authors

Abstract

Keywords

Supplementary materials

Comments

Version History

Metrics

License

DOI

Funding

Author’s competing interest statement

Share