Abstract
Machine learned interatomic potentials (MLIPs) are reshaping computational chemistry practices because of their ability to drastically exceed the accuracy-length/time scale tradeoff. Despite this attraction, the benefits of such efficiency are only impactful when an MLIP uniquely enables insight into a target system or is broadly transferable outside of the training dataset, where models achieving the latter are seldom reported. In this work, we present the 2nd generation of our atoms-in-molecules neural network potential (AIMNet2), which is applicable to species composed of up to 14 chemical elements in both neutral and charged states, making it a valuable method for modeling the majority of non-metallic compounds. Using an exhaustive dataset of 2 x 107 hybrid DFT level of theory quantum chemical calculations, AIMNet2 combines ML-parameterized short-range and physics-based long-range terms to attain generalizability that reaches from simple organics to diverse molecules with “exotic” element-organic bonding. We show that AIMNet2 outperforms semi-empirical GFN-xTB and is on par with reference density functional theory for interaction energy contributions, conformer search tasks, torsion rotation profiles, and molecular-to-macromolecular geometry optimization. Overall, the demonstrated chemical coverage and computational efficiency of AIMNet2 is a significant step toward providing access to MLIPs that avoid the crucial limitation of curating additional quantum chemical data and retraining with each new application.
Supplementary materials
Title
Supplementary Information
Description
Supplementary Table 1: Number of molecules and conformers in training and test datasets.
Supplementary Figure 1: Distribution of molecule sizes in training and test datasets.
Supplementary Figure 2: Distribution of elements in training and test datasets.
Supplementary Figure 3: Distribution of molecular charges for training and test datasets.
Supplementary Note 1: Diverse element-organic CSD benchmark set.
Supplementary Table 2: Benchmark performance statistics of GFN2-xTB and two AIMNet2 variants against experimentally observed geometries with diverse element CSD conformation benchmark set.
Supplementary Figure 4: Distribution of RMSD for dihedral angles of GFN2-xTB and two AIMNet2 variants against experimentally observed geometries in diverse element CSD conformation benchmark set.
Supplementary Note 2: CSD conformer benchmark set
Supplementary Tabe 3: Benchmark performance of various methods on CSD conformer benchmark set
Supplementary Figure 5: Distribution of RMSE and MAE errors for various
Supplementary Table 4: MAE for energy predictions (kcal mol-1) on GMTKN55 subsets
Actions
Supplementary weblinks
Title
AIMNet2 Training Datasets
Description
The datasets contain molecular structures and the properties computed with B97-3c (GGA DFT) or wB97M-def2-TZVPP (range-separated hybrid DFT) methods. Each data file contains about 20M structures. DFT calculation performed with ORCA 5.0.3 software. Properties include energy, forces, atomic charges, and molecular dipole and quadrupole moments.
Actions
View