These are preliminary reports that have not been peer-reviewed. They should not be regarded as conclusive, guide clinical practice/health-related behavior, or be reported in news media as established information. For more information, please see our FAQs.
Automation_Active_Space_Selection_ML_rev.pdf (5.79 MB)

Automation of Active Space Selection for Multireference Methods via Machine Learning on Chemical Bond Dissociation

submitted on 15.12.2019, 19:55 and posted on 20.12.2019, 08:52 by WooSeok Jeong, Samuel J. Stoneburner, Daniel King, Ruye Li, Andrew Walker, Roland Lindh, Laura Gagliardi
Predicting and understanding the chemical bond is one of the major challenges of computational quantum chemistry. Kohn−Sham density functional theory (KS-DFT) is the most common method, but approximate density functionals may not be able to describe systems where multiple electronic configurations are equally important. Multiconfigurational wave functions, on the other hand, can provide a detailed understanding of the electronic structure and chemical bond of such systems. In the complete-active-space self-consistent field (CASSCF) method one performs a full configuration interaction calculation in an active space consisting of active electrons and active orbitals. However, CASSCF and its variants require the selection of these active spaces. This choice is not black-box; it requires significant experience and testing by the user, and thus active space methods are not considered particularly user-friendly and are employed only by a minority of quantum chemists. Our goal is to popularize these methods by making it easier to make good active space choices. We present a machine learning protocol that performs an automated selection of active spaces for chemical bond dissociation calculations of main group diatomic molecules. The protocol shows high prediction performance for a given target system as long as a properly correlated system is chosen for training. Good active spaces are correctly predicted with a considerably better success rate than random guess (larger than 80% precision for most systems studied). Our automated machine learning protocol shows that a “black-box” mode is possible for facilitating and accelerating the large-scale calculations on multireference systems where single-reference methods such as KS-DFT cannot be applied.


This work was performed at the University of Minnesota and was supported as part of the Nanoporous Materials Genome Center, funded by the U.S. Department of Energy, Office of Basic Energy Sciences, under Award DE-FG02-17ER16362, as part of the Computational Chemical Sciences Program. Computer resources were provided by the Minnesota Supercomputing Institute at the University of Minnesota.

R.L. acknowledges the Swedish research council (grant 2016- 03398) and the Olle Engkvist foundation (grant 18-2006).


Email Address of Submitting Author


University of Minnesota


United States

ORCID For Submitting Author


Declaration of Conflict of Interest

No conflict of interest.