Deep Generative Models Enable Navigation in Sparsely Populated Chemical Space

Michael A. Skinnider; R. Greg Stacey; David
S. Wishart; Leonard J. Foster

doi:10.26434/chemrxiv.13638347.v1

Theoretical and Computational Chemistry

Search within Theoretical and Computational Chemistry

Deep Generative Models Enable Navigation in Sparsely Populated Chemical Space

27 January 2021, Version 1

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Deep generative models are powerful tools for the exploration of chemical space, enabling the on-demand gener- ation of molecules with desired physical, chemical, or biological properties. However, these models are typically thought to require training datasets comprising hundreds of thousands, or even millions, of molecules. This per- ception limits the application of deep generative models in regions of chemical space populated by only a small number of examples. Here, we systematically evaluate and optimize generative models of molecules for low-data settings. We carry out a series of systematic benchmarks, training more than 5,000 deep generative models and evaluating over 2.6 billion generated molecules. We find that robust models can be learned from far fewer examples than has been widely assumed. We further identify strategies that dramatically reduce the number of molecules required to learn a model of equivalent quality, and demonstrate the application of these principles by learning models of chemical structures found in bacterial, plant, and fungal metabolomes. The structure of our experiments also allows us to benchmark the metrics used to evaluate generative models themselves. We find that many of the most widely used metrics in the field fail to capture model quality, but identify a subset of well-behaved metrics that provide a sound basis for model development. Collectively, our work provides a foundation for directly learning generative models in sparsely populated regions of chemical space.

Keywords

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Now Published

Chemical language models enable navigation in sparsely populated chemical space

Michael A. Skinnider, R. Greg Stacey, David S. Wishart, Leonard J. Foster journal article

Nature Machine Intelligence , Volume 3, Issue 9

Online publication date: Jul 19, 2021

Version History

Jan 27, 2021 Version 1

Metrics

3,641

1,664

Views

Downloads

License

The content is available under CC BY NC ND 4.0

DOI

10.26434/chemrxiv.13638347.v1

Funding

Genome Canada (214PRO)

Genome Canada (284MBO)

Author’s competing interest statement

No conflict of interest.

Deep Generative Models Enable Navigation in Sparsely Populated Chemical Space

Authors

Abstract

Keywords

Comments

Now Published

Version History

Metrics

License

DOI

Funding

Author’s competing interest statement

Share