Using physical property surrogate models to perform multi-fidelity global optimization of force field parameters


Dispersion-repulsion interactions, commonly represented in atomistic force fields by the Lennard-Jones (LJ) potential, play an important role in the accuracy of molecular simulations. Training the force field parameters used in the LJ potential is challenging, generally requiring adjustment based on simulations of macroscopic physical properties. The computational expense of these simulations limits the size of training data set and number of optimization steps that can be taken, often requiring modelers to perform optimizations within a local parameter region. To allow for global LJ parameter optimization against large training sets, we introduce a multi-fidelity optimization technique which uses Gaussian process surrogate modeling to build inexpensive models of physical properties as a function of LJ parameters. This allows for fast evaluation of objective functions, greatly accelerating searches over parameter space. We use an iterative framework which performs global optimization at the surrogate level, followed by validation at the simulation level and surrogate refinement. Using this technique on two previously studied training sets, containing up to 195 physical property targets, we refit a subset of the LJ parameters for the OpenFF 1.0.0 ``Parsley'' force field. We demonstrate that this multi-fidelity technique can find improved parameter sets compared to a purely simulation-based optimization by searching more broadly and escaping local minima. In most cases, these parameter sets are transferable to other similar molecules in a test set. This multi-fidelity technique provides a platform for fast optimization against physical properties that can be refined and applied in multiple ways to the development of molecular models.


Supplementary material

Supplementary Information for "Using physical property surrogate models to perform multi-fidelity global optimization of force field parameters"
Full description of training sets, performance metrics on training sets, performance metrics on test set.