Data-Driven Approaches Can Overcome Limitations in Multireference Diagnostics

Chenru Duan; Fang Liu; Aditya Nandy; Heather Kulik

doi:10.26434/chemrxiv.12115944.v1

Theoretical and Computational Chemistry

Search within Theoretical and Computational Chemistry

Data-Driven Approaches Can Overcome Limitations in Multireference Diagnostics

13 April 2020, Version 1

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

High-throughput computational screening typically employs methods (i.e., density functional theory or DFT) that can fail to describe challenging molecules, such as those with strongly correlated electronic structure. In such cases, multireference (MR) correlated wavefunction theory (WFT) would be the appropriate choice but remains more challenging to carry out and automate than single-reference (SR) WFT or DFT. Numerous diagnostics have been proposed for identifying when MR character is likely to have an effect on the predictive power of SR calculations, but conflicting conclusions about diagnostic performance have been reached on small data sets. We compute 15 MR diagnostics, ranging from affordable DFT-based to more costly MR-WFT-based diagnostics, on a set of 3,165 equilibrium and distorted small organic molecules containing up to six heavy atoms. Conflicting MR character assignments and low pairwise linear correlations among diagnostics are also observed over this set. We evaluate the ability of existing diagnostics to predict the percent recovery of the correlation energy, %E_corr. None of the DFT-based diagnostics are nearly as predictive of %E_corr as the best WFT-based diagnostics. To overcome the limitation of this cost–accuracy trade-off, we develop machine learning (ML, i.e., kernel ridge regression) models to predict WFT-based diagnostics from a combination of DFT-based diagnostics and a new, size-independent 3D geometric representation. The ML-predicted diagnostics correlate as well with MR effects as their computed (i.e., with WFT) values, significantly improving over the DFT-based diagnostics on which the models were trained. These ML models thus provide a promising approach to improve upon DFT-based diagnostic accuracy while remaining suitably low cost for high-throughput screening.

Keywords

multireference diagnostics

multireference character

theoretical chemistry

correlation energy

strong correlation

electronic structure

small organic molecules

Supplementary materials

Title

Description

Actions

Title

A MRML-I TOC HJK v2

Description

Actions

Title

SI MRML1 Data 04112020

Description

Actions

Title

SI MRML1 v6

Description

Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Now Published

Data-Driven Approaches Can Overcome the Cost–Accuracy Trade-Off in Multireference Diagnostics

Chenru Duan, Fang Liu, Aditya Nandy, Heather J. Kulik journal article

Journal of Chemical Theory and Computation , Volume 16, Issue 7

Online publication date: Jun 14, 2020

Version History

Apr 13, 2020 Version 1

Metrics

2,983

900

Views

Downloads

Citations

License

The content is available under CC BY NC ND 4.0

DOI

10.26434/chemrxiv.12115944.v1

Funding

ONR Grant N00014-17-1-2956

ONR Grant N00014-18-1-2434

ONR Grant N00014-20-1-2150

Basic Energy Sciences

DE-SC0018096

Simultaneous mitigation of density and energy errors in approximate DFT for transition metal chemistry

https://app.dimensions.ai/details/grant/grant.7065085

MolSSI Postdoctoral Fellowship

Burroughs Wellcome Fund Career Award at the Scientific Interface

AAAS Marion Mason Milligan Award

Author’s competing interest statement

The authors declare no conflict of interest.

Data-Driven Approaches Can Overcome Limitations in Multireference Diagnostics

Authors

Abstract

Keywords

Supplementary materials

Comments

Now Published

Version History

Metrics

License

DOI

Funding

Author’s competing interest statement

Share