Abstract
Atomistic simulations driven by machine learning-based potentials (MLPs) are a cost-effective alternative to ab initio molecular dynamics (AIMD). Yet, their broad applicability in reaction modelling remains hindered, in part, by the need for large training datasets that adequately sample the relevant potential energy surface, including high-energy transition state (TS) regions. To optimise dataset generation and extend the use of MLPs for reaction modelling, we present a workflow that combines automated active learning with well-tempered metadynamics, requiring no prior knowledge of TSs. Using data-efficient architectures, such as the linear Atomic Cluster Expansion, we illustrate the performance of this strategy in various organic reactions where the environment is described at different levels, including the SN2 reaction between fluoride and chloromethane in implicit water, the methyl shift of 2,2-dimethylisoindene in the gas phase, and a glycosylation reaction in explicit dichloromethane solution, where competitive pathways exist. The proposed training strategy yields accurate and stable MLPs for all three cases, highlighting its versatility for modelling reactive processes.
Supplementary materials
Title
supporting information
Description
Hyperparameters used for training and a detailed description of the training protocol for each of the reactions presented here.
Actions