Abstract
Predictive simulations of dynamic processes in molecular systems require fast, accurate and reactive interatomic potentials. Machine learning offers a promising approach to construct force-field models for large-scale molecular simulation by fitting to high-level quantum-mechanical data. However, machine-learned force fields generally require considerable human intervention and data volume. Here we show that, by leveraging hierarchical and active learning, accurate Gaussian Approximation Potential (GAP) models for diverse chemical systems can be developed in an autonomous way, requiring only hundreds to a few thousand energy and gradient evaluations on the reference potential-energy surface. Our approach relies on a decomposition of the condensed-phase molecular system into intra- and inter-molecular terms, and on the definition of a prospective error metric to quantify accuracy. We demonstrate applications to a range of molecular systems: from bulk water, organic solvents, and a solvated ion onwards to the description of chemical reactivity, including, a bifurcating Diels–Alder reaction in the gas phase and non-equilibrium dynamics (SN2 reaction) in explicit solvent. The method provides a route to routinely generating machine-learned force fields for complex and/or reactive molecular systems.