Theoretical and Computational Chemistry

Stacked Ensemble Machine Learning for Range-Separation Parameters



High-throughput virtual materials and drug discovery based on density functional theory has achieved tremendous success in recent decades, but its power on organic semiconducting molecules suffers catastrophically from self-interaction error until the optimally tuned range-separated hybrid (OT-RSH) exchange--correlation functionals were developed. The accurate but expensive fi rst-principles OT-RSH transitions from a short-range (semi-)local functional to a long-range Hartree--Fock exchange at a distance characterized by the inverse of a molecule-specifi c, non-empirically-determined range-separation parameter (ω). In the present study, we proposed a promising stacked ensemble machine learning model that provides an accelerated alternative of OT-RSH based on system-dependent structural and electronic con figurations. We trained ML-ωPBE, the first functional in our series, using a database of 1,970 organic semiconducting molecules with sufficient structural diversity, and assessed its accuracy and efficiency using another 1,956 molecules. Compared with the first-principles OT-ωPBE, our ML-ωPBE reached a mean absolute error of 0:00504a0^{-1} for the optimal value of ω, reduced the computational cost for the test set by 2.66 orders of magnitude, and achieved a comparable predictive power in various optical properties.


Thumbnail image of main_text_v6.pdf

Supplementary material

Thumbnail image of si_v5.pdf
Supporting Information
Brief proof of Koopmans' theorem and asymptotic decay of electronic density, descriptions of details for general OT-ωPBE and ML-ωPBE functionals, composite molecular descriptors, the SEML model, quantum chemical calculations, and summaries of statistics of errors of ML-ωPBE and other XC functionals in optical properties.
Thumbnail image of dataset.xlsx
SMILES strings and ω values for all molecules in the training and test sets.