Abstract
Machine learned reactive force fields based on polynomial expansions have been shown
to be highly effective for describing simulations involving reactive materials. Nevertheless, the highly flexible nature of these models can give rise to a large number of
candidate parameters for complicated systems. In these cases, reliable parameterization requires a well-formed training set, which can be difficult to achieve through
standard iterative fitting methods. Here we present an active learning approach based
on cluster analysis and Shannon information theory to enable semi-automated generation of informative training sets and robust machine learned force fields. Use of
this tool is demonstrated for development of a model based on linear combinations
of Chebyshev polynomials explicitly describing up to four-body interactions, for a
chemically and structurally diverse system of C/O under extreme conditions. We
show that this flexible training repository management approach enables development of models exhibiting excellent agreement with Kohn–Sham density functional
theory (DFT) in terms of structure, dynamics, and speciation.