ChemRxiv
These are preliminary reports that have not been peer-reviewed. They should not be regarded as conclusive, guide clinical practice/health-related behavior, or be reported in news media as established information. For more information, please see our FAQs.
1/1

Data-Driven Approaches Can Overcome Limitations in Multireference Diagnostics

preprint
submitted on 12.04.2020 and posted on 13.04.2020 by Chenru Duan, Fang Liu, Aditya Nandy, Heather Kulik
High-throughput computational screening typically employs methods (i.e., density functional theory or DFT) that can fail to describe challenging molecules, such as those with strongly correlated electronic structure. In such cases, multireference (MR) correlated wavefunction theory (WFT) would be the appropriate choice but remains more challenging to carry out and automate than single-reference (SR) WFT or DFT. Numerous diagnostics have been proposed for identifying when MR character is likely to have an effect on the predictive power of SR calculations, but conflicting conclusions about diagnostic performance have been reached on small data sets. We compute 15 MR diagnostics, ranging from affordable DFT-based to more costly MR-WFT-based diagnostics, on a set of 3,165 equilibrium and distorted small organic molecules containing up to six heavy atoms. Conflicting MR character assignments and low pairwise linear correlations among diagnostics are also observed over this set. We evaluate the ability of existing diagnostics to predict the percent recovery of the correlation energy, %Ecorr. None of the DFT-based diagnostics are nearly as predictive of %Ecorr as the best WFT-based diagnostics. To overcome the limitation of this cost–accuracy trade-off, we develop machine learning (ML, i.e., kernel ridge regression) models to predict WFT-based diagnostics from a combination of DFT-based diagnostics and a new, size-independent 3D geometric representation. The ML-predicted diagnostics correlate as well with MR effects as their computed (i.e., with WFT) values, significantly improving over the DFT-based diagnostics on which the models were trained. These ML models thus provide a promising approach to improve upon DFT-based diagnostic accuracy while remaining suitably low cost for high-throughput screening.

Funding

ONR Grant N00014-17-1-2956

ONR Grant N00014-18-1-2434

ONR Grant N00014-20-1-2150

Simultaneous mitigation of density and energy errors in approximate DFT for transition metal chemistry

Basic Energy Sciences

Find out more...

MolSSI Postdoctoral Fellowship

Burroughs Wellcome Fund Career Award at the Scientific Interface

AAAS Marion Mason Milligan Award

History

Email Address of Submitting Author

hjkulik@mit.edu

Institution

Massachusetts Institute of Technology

Country

United States

ORCID For Submitting Author

0000-0001-9342-0191

Declaration of Conflict of Interest

The authors declare no conflict of interest.

Exports