Active Learning for Robust, High-Complexity Reactive Atomistic Simulations

Machine learned reactive force fields based on polynomial expansions have been shown to be highly effective for describing simulations involving reactive materials. Nevertheless, the highly flexible nature of these models can give rise to a large number of candidate parameters for complicated systems. In these cases, reliable parameterization requires a well-formed training set, which can be difficult to achieve through standard iterative fitting methods. Here we present an active learning approach based on cluster analysis and Shannon information theory to enable semi-automated generation of informative training sets and robust machine learned force fields. Use of this tool is demonstrated for development of a model based on linear combinations of Chebyshev polynomials explicitly describing up to four-body interactions, for a chemically and structurally diverse system of C/O under extreme conditions. We show that this flexible training repository management approach enables development of models exhibiting excellent agreement with Kohn–Sham density functional theory (DFT) in terms of structure, dynamics, and speciation.