Uncertain Times Call for Quantitative Uncertainty Metrics: Controlling Error in Neural Network Predictions for Chemical Discovery

Jon Paul Janet; Chenru Duan; Tzuhsiung Yang; Aditya Nandy; Heather Kulik

doi:10.26434/chemrxiv.7900277.v1

Theoretical and Computational Chemistry

Search within Theoretical and Computational Chemistry

Uncertain Times Call for Quantitative Uncertainty Metrics: Controlling Error in Neural Network Predictions for Chemical Discovery

27 March 2019, Version 1

This is not the most recent version. There is a

newer version

of this content available

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Machine learning (ML) models, such as artificial neural networks, have emerged as a complement to high-throughput screening, enabling characterization of new compounds in seconds instead of hours. The promise of ML models to enable large-scale, chemical space exploration can only be realized if it is straightforward to identify when molecules and materials are outside the model’s domain of applicability. Established uncertainty metrics for neural network models are either costly to obtain (e.g., ensemble models) or rely on feature engineering (e.g., feature space distances), and each has limitations in estimating prediction errors for chemical space exploration. We introduce the distance to available data in the latent space of a neural network ML model as a low-cost, quantitative uncertainty metric that works for both inorganic and organic chemistry. The calibrated performance of this approach exceeds widely used uncertainty metrics and is readily applied to models of increasing complexity at no additional cost. Tightening latent distance cutoffs systematically drives down predicted model errors below training errors, thus enabling predictive error control in chemical discovery or identification of useful data points for active learning.

Keywords

uncertainty quantification

machine learning

transition metal chemistry

chemical discovery

Supplementary materials

Title

Description

Actions

Title

SupportingInformation

Description

Actions

Title

dft-results

Description

Actions

Title

geometries

Description

Actions

Title

models

Description

Actions

Title

predictions

Description

Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Version History

May 15, 2019 Version 2

Mar 27, 2019 Version 1

Metrics

5,287

1,812

Views

Downloads

Citations

License

The content is available under CC BY NC ND 4.0

DOI

10.26434/chemrxiv.7900277.v1

Funding

DARPA grant D18AP00039

Office of Naval Research grant N00014-18-1-2434

Author’s competing interest statement

The authors declare no conflict of interest.

Uncertain Times Call for Quantitative Uncertainty Metrics: Controlling Error in Neural Network Predictions for Chemical Discovery

Authors

Abstract

Keywords

Supplementary materials

Comments

Version History

Metrics

License

DOI

Funding

Author’s competing interest statement

Share