Assessing Methods and Obstacles in Chemical Space Exploration

21 August 2020, Version 2
This content is a preprint and has not undergone peer review at the time of posting.


Benchmarking the performance of generative methods for drug design is complex and multifaceted. In this report, we propose a separation of concerns for de novo drug design, categorizing the task into three main categories: generation, discrimination, and exploration. We demonstrate that changes to any of these three concerns impacts benchmark performance for drug design tasks. In this report we present Deriver, an open-source Python package that acts as a modular framework for molecule generation, with a focus on integrating multiple generative methods. Using Deriver, we demonstrate that changing parameters related to each of these three concerns impacts chemical space traversal significantly, and that the freedom to independently adjust each is critical to real-world applications having conflicting priorities. We find that combining multiple generative methods can improve optimization of molecular properties, and lower the chance of becoming trapped in local minima. Additionally, filtering molecules for drug-likeness (based on physicochemical properties and SMARTS pattern matching) before they are scored can hinder exploration, but can improve the quality of the final molecules. Finally, we demonstrate that any given task has an exploration algorithm best suited to it, though in practice linear probabilistic sampling generally results in the best outcomes, when compared to Monte Carlo sampling or greedy sampling. We intend that Deriver, which is being made freely available, will be helpful to others interested in collaboratively improving existing methods in de novo drug design centered around inheritance of molecular structure, modularity, extensibility, and separation of concerns.


de novo drug design
chemical space
molecular generators
molecule design
genetic algorithm
benchmark datasets


Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.