ChemRxiv
These are preliminary reports that have not been peer-reviewed. They should not be regarded as conclusive, guide clinical practice/health-related behavior, or be reported in news media as established information. For more information, please see our FAQs.
1/1
2 files

Comparative Study of Deep Generative Models on Chemical Space Coverage

preprint
submitted on 13.11.2020, 10:38 and posted on 16.11.2020, 08:02 by Jie Zhang, Rocío Mercado, Ola Engkvist, Hongming Chen

In recent years, deep molecular generative models have emerged as novel methods for de novo molecular design. Thanks to the rapid advance of deep learning techniques, deep learning architectures such as recurrent neural networks, generative autoencoders, and adversarial networks, to give a few examples, have been employed for constructing generative models. However, so far the metrics used to evaluate these deep generative models are not discriminative enough to separate the performance of various state-of-the-art generative models. This work presents a novel metric for evaluating deep molecular generative models; this new metric is based on the chemical space coverage of a reference database, and compares not only the molecular structures, but also the ring systems and functional groups, reproduced from a reference dataset of 1M structures. In this study, the performance of 7 different molecular generative models was compared by calculating their structure and substructure coverage of the GDB-13 database while using a 1M subset of GDB-13 for training. Our study shows that the performance of various generative models varies significantly using the benchmarking metrics introduced herein, such that generalization capability of the generative model can be clearly differentiated. Additionally, the coverage of ring systems and functional groups existing in GDB-13 was also compared between the models. Our study provides a useful new metric that can be used for evaluating and comparing generative models.

History

Email Address of Submitting Author

zhang_jie@grmh-gdl.com

Institution

Bioland laboratory

Country

China

ORCID For Submitting Author

0000-0002-5575-303X

Declaration of Conflict of Interest

There are no conflicts of interest.

Exports