Are Learned Molecular Representations Ready for Prime Time?

Kevin Yang; Kyle Swanson; Wengong Jin; Connor Coley; philipp eiden; Hua Gao; Angel Guzman-Perez; Timothy Hopper; Brian
P. Kelley; miriam mathea; Andrew Palmer; Volker Settels; Tommi S Jaakkola; Klavs F. Jensen; Regina Barzilay

doi:10.26434/chemrxiv.7940594.v2

Theoretical and Computational Chemistry

Search within Theoretical and Computational Chemistry

Are Learned Molecular Representations Ready for Prime Time?

16 July 2019, Version 2

This is not the most recent version. There is a

newer version

of this content available

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Advancements in neural machinery have led to a wide range of algorithmic solutions for molecular property prediction. Two classes of models in particular have yielded promising results: neural networks applied to computed molecular fingerprints or expert-crafted descriptors, and graph convolutional neural networks that construct a learned molecular representation by operating on the graph structure of the molecule. However, recent literature has yet to clearly determine which of these two methods is superior when generalizing to new chemical space. Furthermore, prior research has rarely examined these new models in industry research settings in comparison to existing employed models. In this paper, we benchmark models extensively on 19 public and 16 proprietary industrial datasets spanning a wide variety of chemical endpoints. In addition, we introduce a graph convolutional model that consistently matches or outperforms models using fixed molecular descriptors as well as previous graph neural architectures on both public and proprietary datasets. Our empirical findings indicate that while approaches based on these representations have yet to reach the level of experimental reproducibility, our proposed model nevertheless offers significant improvements over models currently used in industrial workflows.

Keywords

Message passing algorithm

Neural Networks QSAR

Machine Learning Techniques

Property Predictions

Supplementary materials

Title

Description

Actions

Title

Supporting-Information-Are-Learned-Molecular-Representations-Ready-for-Prime-Time

Description

Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.