Machine learned interatomic potentials (MLIPs) are reshaping computational chemistry practices because of their ability to drastically exceed the accuracy-length/time scale tradeoff. Despite this attraction, the benefits of such efficiency are only impactful when an MLIP uniquely enables insight into a target system or is broadly transferable outside of the training dataset, where models achieving the latter are seldom reported. In this work, we present the 2nd generation of our atoms-in-molecules neural network potential (AIMNet2), which is applicable to species composed of up to 14 chemical elements in both neutral and charged states, making it a valuable method for modeling the majority of non-metallic compounds. Using an exhaustive dataset of 2 x 107 hybrid DFT level of theory quantum chemical calculations, AIMNet2 combines ML-parameterized short-range and physics-based long-range terms to attain generalizability that reaches from simple organics to diverse molecules with “exotic” element-organic bonding. We show that AIMNet2 outperforms semi-empirical GFN-xTB and is on par with reference density functional theory for interaction energy contributions, conformer search tasks, torsion rotation profiles, and molecular-to-macromolecular geometry optimization. Overall, the demonstrated chemical coverage and computational efficiency of AIMNet2 is a significant step toward providing access to MLIPs that avoid the crucial limitation of curating additional quantum chemical data and retraining with each new application.
Supplementary materials
Supplementary Information
Supplementary Table 1: Number of molecules and conformers in training and test datasets.
Supplementary Figure 1: Distribution of molecule sizes in training and test datasets.
Supplementary Figure 2: Distribution of elements in training and test datasets.
Supplementary Figure 3: Distribution of molecular charges for training and test datasets.
Supplementary Note 1: Diverse element-organic CSD benchmark set.
Supplementary Table 2: Benchmark performance statistics of GFN2-xTB and two AIMNet2 variants against experimentally observed geometries with diverse element CSD conformation benchmark set.
Supplementary Figure 4: Distribution of RMSD for dihedral angles of GFN2-xTB and two AIMNet2 variants against experimentally observed geometries in diverse element CSD conformation benchmark set.
Supplementary Note 2: CSD conformer benchmark set
Supplementary Tabe 3: Benchmark performance of various methods on CSD conformer benchmark set
Supplementary Figure 5: Distribution of RMSE and MAE errors for various
Supplementary Table 4: MAE for energy predictions (kcal mol-1) on GMTKN55 subsets
Supplementary weblinks
AIMNet2 Training Datasets
The datasets contain molecular structures and the properties computed with B97-3c (GGA DFT) or wB97M-def2-TZVPP (range-separated hybrid DFT) methods. Each data file contains about 20M structures. DFT calculation performed with ORCA 5.0.3 software. Properties include energy, forces, atomic charges, and molecular dipole and quadrupole moments.