Sample Efficient Reinforcement Learning with Active Learning for Molecular Design

Michael Dodds; Jeff Guo; Thomas Löhr; Alessandro Tibo; Ola Engkvist; Jon Paul Janet

doi:10.26434/chemrxiv-2023-j88dg

Theoretical and Computational Chemistry

Search within Theoretical and Computational Chemistry

Sample Efficient Reinforcement Learning with Active Learning for Molecular Design

08 August 2023, Version 1

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Reinforcement learning (RL) is a powerful and flexible paradigm for searching for solutions in high-dimensional action spaces. However, bridging the gap between playing computer games with thousands of simulated episodes and solving real scientific problems with complex and involved environments (up to actual laboratory experiments) requires improvements in terms of sample efficiency to make the most of expensive information. The discovery of new drugs is a major commercial application of RL, motivated by the very large nature of the chemical space and the need to perform multiparameter optimization (MPO) across different properties. In silico methods, such as virtual library screening (VS) and de-novo molecular generation with RL, show great promise in accelerating this search. However, incorporation of increasingly complex computational models in these workflows requires increasing sample efficiency. Here, we introduce an active learning system linked with an RL model (RL-AL) for molecular design, which aims to improve the sample-efficiency of the optimization process. We identity and characterize unique challenges combining RL and AL, investigate the interplay between the systems, and develop a novel AL approach to solve the MPO problem. Our approach greatly expedites the search for novel solutions relative to baseline-RL for simple ligand- and structure-based oracle functions, with a 1000-75 000%-increase in hits generated for a fixed oracle budget and a 14-65-fold reduction in computational time to find a specific number of hits. Furthermore, compounds discovered through RL-AL display substantial enrichment of a multi-parameter scoring objective, indicating superior efficacy in curating high-scoring compounds, without a reduction in output diversity. This significant acceleration improves the feasibility of oracle functions that have largely been overlooked in RL due to high computational costs, for example free energy perturbation methods, and in principle is applicable to any RL domain.

Keywords

Reinforcement Learning

Drug Design

Active Learning

REINVENT

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Version History

Aug 08, 2023 Version 1

Metrics

1,522

925

Views

Downloads

License

The content is available under CC BY 4.0

DOI

10.26434/chemrxiv-2023-j88dg

Author’s competing interest statement

T. Löhr, A. Tibo, O. Engkvist and JP Janet are employees of, and potentially hold shares in, AstraZeneca

Ethics

The author(s) have declared ethics committee/IRB approval is not relevant to this content

Sample Efficient Reinforcement Learning with Active Learning for Molecular Design

Authors

Abstract

Keywords

Comments

Version History

Metrics

License

DOI

Author’s competing interest statement

Ethics

Share