Semi-Supervised Machine Learning Enables the Robust Detection of Multireference Character at Low Cost

02 July 2020, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Multireference (MR) diagnostics are common tools for identifying strongly correlated electronic structure that makes single reference (SR) methods (e.g., density functional theory or DFT) insufficient for accurate property prediction. However, MR diagnostics typically require computationally demanding correlated wavefunction theory (WFT) calculations, and diagnostics often disagree or fail to predict MR effects on properties. To overcome these challenges, we introduce a semi-supervised machine learning (ML) approach with virtual adversarial training (VAT) of an MR classifier using 15 WFT and DFT MR diagnostics as inputs. In semi-supervised learning, only the most extreme SR or MR points are labeled, and the remaining point labels are learned. The resulting VAT model outperforms the alternatives, as quantified by the distinct property distributions of SR- and MR-classified molecules. To reduce the cost of generating inputs to the VAT model, we leverage the VAT model’s robustness to noisy inputs by replacing WFT MR diagnostics with regression predictions in a MR decision engine workflow that preserves excellent performance. We demonstrate the transferability of our approach to larger molecules and those with distinct chemical composition from the training set. This MR decision engine demonstrates promise as a low-cost, high-accuracy approach to the automatic detection of strong correlation for predictive high-throughput screening.

Keywords

strong correlation
machine learning
multireference character
multireference diagnostics
classifier models

Supplementary materials

Title
Description
Actions
Title
ASemiSupervisedLearningApproach
Description
Actions
Title
SI MRML2 v12
Description
Actions
Title
SI MRML2 Data 06242020
Description
Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.