Traversing Chemical Space with Active Deep Learning

23 November 2023, Version 2
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Deep learning is accelerating drug discovery. However, current approaches are often affected by limitations in the available data, e.g., in terms of size or molecular diversity. Active learning is poised to be a solution for drug discovery in low-data regimes. In active learning, a model is updated iteratively by taking multiple smaller screening steps, instead of suggesting many molecules at once with a single model for traditional ‘one-shot’ screening. This iterative approach aims to improve models during the screening process and can adjust course along the way. However, active learning remains still relatively underexplored in the molecular sciences. It is currently unclear how active learning holds up to traditional approaches and what the best strategies are for prospective drug discovery applications. In this study, we lay the first foundations for the prospective use of active deep learning in low-data scenarios where only dozens of training molecules are available to screen hundreds of thousands of molecules. Our systematic study combines six active learning strategies, two deep learning architectures, and three large-scale molecular libraries. We highlight that active learning can achieve up to a six-fold improvement in hit discovery compared to traditional methods. How molecules are chosen for the next iteration proved to be the primary driver of performance – it is more important than the chosen network architecture in determining the ‘molecular journey’ in the chemical space. Remarkably, active learning showed to quickly compensate for a lack of molecular diversity in the starting set, allowing to efficiently chart unexplored structural motifs. These results set the basis for the adoption of active deep learning to accelerate drug discovery in low-data regimes.

Keywords

Active learning
drug discovery
molecular deep learning
virtual screening

Supplementary materials

Title
Description
Actions
Title
Supplementary for Traversing chemical space with active deep learning
Description
Supplementary information for 'Traversing chemical space with active deep learning' containing performance figures on datasets and methods not mentioned in the main text, along with supporting tables
Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.