Abstract
ReaxFF is a computationally efficient force field to simulate complex reactive dynamics in extended molecular models with diverse chemistries, if reliable force-field parameters are available for the chemistry of interest. If not, they must be calibrated by minimizing the error ReaxFF makes on a relevant training set. Because this optimization is far from trivial, many methods, in particular genetic algorithms (GAs), have been developed to search for the global optimum in parameter space. Recently, two alternative parameter calibration techniques were proposed, i.e.\ Monte-Carlo Force Field optimizer (MCFF) and Covariance Matrix Adaptation Evolutionary Strategy (CMA-ES), which have the potential to find good parameters at a relatively low computational cost. In this work, these two methods are tested, as implemented in ADF2018, using three ReaxFF training sets, which have previously been used to benchmark GAs. Even though MCFF and CMA-ES should not be considered as exhaustive global optimizers, they can find parameters that are comparable in quality to those obtained with GAs. We observe that CMA-ES leads to slightly better results and is less sensitive to the initial guess of the parameters. Concrete recipes are provided for obtaining similar results with new training sets.
Besides optimization recipes, a successful ReaxFF parameterization requires the design of a good training set. At every trial set of parameters, ReaxFF is used to optimize molecular geometries in the training set. When the optimization of some geometries fails easily, it becomes increasingly difficult to find the optimal parameters. We have addressed this issue by fixing several bugs in the ReaxFF forces and by improving the robustness of the geometry optimization. These improvements cannot eliminate all geometry convergence issues and we recommend to avoid very flexible geometries in the training set.
Both MCFF and CMA-ES are still liable to converge to sub- or near-optimal parameters, which we detected by repeating the calibration with different random seeds. The existence of distinct near-optimal parameter vectors is a general pattern throughout our study and provides opportunities to improve the training set or to detect overfitting artifacts.