Exploration and Validation of Force Field Design Protocols through QM-to-MM Mapping

The scale of the parameter optimisation problem in traditional molecular mechanics force field construction means that design of a new force field is a long process, and sub-optimal choices made in the early stages can persist for many generations of the force field. We hypothesise that careful use of quantum mechanics to inform molecular mechanics parameter derivation (QM-to-MM mapping) should be used to significantly reduce the number of parameters that require fitting to experiment and increase the pace of force field development. Here, we design a collection of 15 new protocols for small, organic molecule force field design, and test their accuracy against experimental liquid properties. Our best performing model has only seven fitting parameters, yet achieves mean unsigned errors of just 0.031 g/cm3 and 0.69 kcal/mol in liquid densities and heats of vaporisation, compared to experiment. The software required to derive the designed force fields is freely available at https://github.com/qubekit/QUBEKit.


Introduction
Classical molecular mechanics force fields are widely used approximations to recover the energy and forces of an atomistic system as a function of its nuclear coordinates. Force fields are an invaluable companion to quantum mechanical modelling in cases where the number of atoms or the time scale to be modelled would be otherwise restricted. Such techniques have seen applications in, for example, the modelling of battery materials, 1,2 organic light-emitting diodes, 3,4 crystal structure prediction, 5 and particularly in computeraided drug design where they form the basis of molecular dynamics, docking and free energy simulations for calculating protein-ligand binding affinity. 6,7 Typically, for modelling organic molecules in biology and chemistry, the force field is a sum of bonded (including harmonic bond-stretching and angle-bending terms, and anharmonic 4-body torsion potentials) and non-bonded (including Coulombic and Lennard-Jones interactions) terms. Small molecule force fields of this form include GAFF, 8 CGenFF, 9 OPLS, 10 and the Open Force Field 1.0.0 ('Parsley') force field. 11 In general, the bonded and charge parameters of the force field may be fit to quantum mechanical potential energy surfaces and electrostatic potentials, respectively. On the other hand, the Lennard-Jones parameters are nearly always fit to experimental pure liquid properties. 10,12 To be broadly effective, small molecule force fields require accurate parameterisation of each of the above terms for all chemical space. While recent trends have seen parameters fit to larger datasets, complete coverage is challenging. Numerous studies, such as the SAMPL blind challenges 13 have shown that there is plenty of room for improvement when it comes to predictive molecular modelling with force fields.
In contrast to the empirical molecular mechanics force fields common in biological modelling, interatomic potentials may instead be derived directly from quantum mechanics (QM).
For example, intermolecular perturbation theory may be used to separate the full QM interaction energy into physically motivated components, 14 or machine learning based potentials may be trained on QM energies and forces. 15 These methods are attractive because they remove the empiricism of the force field approach, but they are computationally more expensive, and unless they fully incorporate, for example, many body and quantum nuclear effects they are unlikely to reproduce condensed phase observables with sufficient accuracy.
QM-to-MM mapping reduces size of parameter search space.
Between the empiricism of typical classical force fields and the accurate, yet expensive, ab initio derived force fields, increasing attention is being drawn to mapping physically motivated parameters from QM into simple MM functional forms. By deriving bespoke force field parameters for the molecule under study directly from QM, the number of transferable, empirical parameters to be fit is significantly reduced. By retaining common MM functional forms, the potentials may readily be implemented in widely used MM software and used in, for example, free energy calculations. Bespoke intramolecular bond, angle and, particularly, torsion parameters may be readily derived from a small number of QM calculations either by fitting to Hessian matrices 16 or potential energy surface scans. 17,18 Atom-centred partial charges can be routinely derived from either semi-empirical calculations, 19,20 or QM electrostatic potential fitting, 21,22 or atoms-in-molecule electron density partitioning. [23][24][25][26] Less common, but perhaps most interestingly, is the possibility of using atoms-in-molecule electron density partitioning to derive other components of the non-bonded interaction from QM, in particular dispersion coefficients (C 6 , 27-30 C 8 , 31,32 ...), atomic polarisabilities, 32,33 and off-centre charges to model electronic anisotropy. 28,29 For example, Visscher and Geerke have derived a polarisable force field model, with a higher order dispersion term derived from iterative Hirshfeld atoms-in-molecule analysis, and applied it to small-molecule amino acid analogues. 34 Of the 138 nonbonded parameters in their model, 132 are determined from QM, leaving just six to be fit to experiment. Kantonen et al 35 employ a mapping of information from minimal basis iterative stockholder (MBIS) atomic electron densities (in particular the decay constants of the electron densities), to derive atom-specific Lennard-Jones parameters.
The mapping parameters (two per element) are fit to experimental liquid properties using the ForceBalance software. 36 Our own work has focused on the development of a QUantum mechanical BEspoke (QUBE) force field and associated toolkit (QUBEKit), 29 built on QM-to-MM parameter mapping. QUBE bond and angle force field parameters are derived from the QM Hessian matrix of the molecule under study, using the modified Seminario method. 37 Atomic partial charges are computed from the density derived electrostatic and chemical (DDEC) partitioned atomic electron densities. 38,39 The Tkatchenko-Scheffler method 27 is used to derive C 6 parameters from the same atomic electron densities, and the repulsive part of the Lennard-Jones potential is derived from atoms-in-molecule atomic radii. 28 Once all other parameters are in place, flexible torsion parameters can be fit to QM dihedral scans using interfaces between QUBEKit and external QM and MM software packages. 29 A small number of mapping parameters (in this case free atom radii for use in the derivation of Lennard-Jones parameters) is used to ensure accuracy of condensed phase properties, 29 and the resulting force fields have also been shown to perform well in the calculation of protein-ligand binding free energies. [40][41][42] Force field design choices can be tested and optimised.
The above mentioned force fields derived from QM-to-MM parameter mappings have a simple functional form, are straightforward to derive from a small number of simple QM calculations, and are bespoke to the system under study. However, despite the advantages, there are still multiple design choices that must be made when constructing a QM-derived force field.
These range from the choice of functional form to use for the final force field, to the choice of underlying QM method to compute the electron density, how to partition it between atoms, and how to map the partitioned density to force field parameters. These choices may be regarded as hyperparameters of the model. 43 Drawing on lessons from the machine learning field, separate models for each hyperparameter should be systematically trained and tested.
However, these hyperparameters are often 'baked in' to the design of the force field, and correcting known problems can be challenging. 44 Until recently, the manual training of force field parameters for a given set of hyperparameters (for example, Lennard-Jones parameters for a given charge model) would be extremely time-consuming. However, allied with improvements in data collection and retrieval, 45 advances in automated parameter fitting to experimental liquid properties now make this possible. 36,46,47 Most notably, the ForceBalance software enables reproducible and automated force field parameterisation against a set of target data that can include either QM or experimental input data. In the context of the current work, ForceBalance was used to tune the QM-to-MM mapping parameters against experimental liquid data in the aforementioned study by Kantonen et al, 35 and was also used to rapidly re-fit Lennard-Jones parameters for a range of proposed implicit solvent models (hyperparameters) in the development of the RESP2 method. 22 An automated toolkit for force field design, training and testing.
Until now, the systematic testing of design choices in MM force fields has been rather limited, and it is almost impossible for individual users to undertake this task due to the scale of the parameter optimisation problem. In what follows, we describe our interface between the QUBEKit and ForceBalance software packages. This open source (including all dependencies) software workflow allows users to make a choice of force field hyperparameters, train the model against experimental liquid properties, and unambiguously test the resulting accuracy. We provide improved protocols for deriving the positions and magnitudes of off-site charges (virtual sites) to model anisotropic electron density from the output of atoms-in-molecule electron density partitioning calculations, as well as tools for including them in torsion parameter fitting. Using a train/test split of 15/51 small, organic molecules, we develop 15 force field protocols that aim to test the accuracy of different choices of i) underlying QM method and basis set, ii) atoms-in-molecule electron density partitioning method, iii) implicit solvent model parameters (used to pre-polarise charges for use in the condensed phase), iv) mapping of atomic electron densities to LJ parameters, v) assignment of Lennard-Jones parameters to polar hydrogen atoms, and vi) use of virtual sites to model anisotropic electron density. We show that a wide range of different choices can be made in the force field design process, some of which do not affect the overall accuracy, and some of which show marked improvements. For our best performing protocol, we provide all of the parameters required for users to derive their own force fields using the QUBEKit software.
With these developments comes the potential for future rapid design of next-generation force fields.

QUBEKit Workflow
QUBEKit is a largely Python 3.6+ based force field derivation toolkit for mac and Linux operating systems. By combining multiple open-source bioinformatics and quantum chemistry packages into a single workflow, it is possible to generate bespoke force fields for molecular dynamics simulations (Figure 1(a)). QUBEKit can be run using only Anaconda or PyPI installable open-source packages, with the option to use Gaussian 48 for QM and torsion optimisations, and the fortran package Chargemol 24,25 for atoms-in-molecule analysis.
The general workflow of QUBEKit begins with generating a set of initial parameters for an OPLS-style force field: where k r , r 0 , k θ , θ 0 , V 1−4 , q i , ij and σ ij are force field parameters to be derived. QUBEKit can take as input most industry-standard molecule formats, such as a SMILES string, or pdb or mol2 file. Initial parameters may be generated from the Open Force Field toolkit, 11 Antechamber, 49 or a supplied XML file. From there, multiple conformers of the molecule coordinates are generated and used in a fast but basic optimisation via TorchANI, 50 XTB 51 or OpenMM. 52 Of these conformers, the lowest energy result is taken to the QM optimisation stage. The QM structural optimisation is performed using either PSI4 53 or Gaussian, 48 both run through QCEngine 54 with an ultra-fine grid. If the first, lowest energy conformer fails to optimise after 50 iterations, the next conformer is passed instead. The QM optimised coordinates are used as input for an electron density calculation, a Hessian matrix calculation and optional 1D torsiondrive calculation(s) (Figure 1(a)). The Hessian matrix is used to calculate all bond and angle parameters (k r , r 0 , k θ , θ 0 ) via the modified Seminario method. 37 QUBE uses atoms-in-molecule (AIM) electron density partitioning to obtain all nonbonded parameters of the force field. AIM methods partition the total electron density n(r) into overlapping atomic densities n i (r) according to: where the form of the weights, w i (r), are determined by the choice of AIM method. Common choices include (iterative) Hirshfeld, 26 MBIS 23 and density derived electrostatic and chemical (DDEC) 24,25 partitioning. Following AIM partitioning, atom-centred partial charges are simply assigned by integrating the atomic electron densities: where z i is the nuclear charge. It is well-known that partial charges in non-polarisable force fields need to be scaled in some way to account for polarisation in an effective manner in condensed phase simulations. Common approaches include use of bond order corrections or charge scaling factors, 19,20 but we prefer to use an implicit solvent model with a dielectric constant intermediate between the gas and water phases. 22,28,55 The final charges thus depend on the choice of underlying QM method used to compute n(r), the choice of AIM method, and the choice of implicit solvent model and parameters. Previously we computed the electron density using the linear-scaling DFT code, ONETEP, 56 using the PBE exchange-correlation functional and a non-orthogonal generalised Wannier function basis set. 29 A multigrid Poisson solver 57 was used with a dielectric of 4 to pre-polarise the charges, and an implementation of the DDEC AIM method in ONETEP (similar to DDEC3 39 ) was employed to partition the electron density. As discussed in the Introduction, we now have the infrastructure in place to systematically test the effects of these hyperparameter choices. Table 1 summarises the alternative model protocols that will be investigated here to test the effect of the underlying QM method (models 1a and 1b), the implicit solvent parameters (models 2a-2c) and AIM partitioning scheme (models 3a and 3b) on the force field parameters, and physical properties.
In the original QUBE force field, the non-bonded part of the force field includes the Lennard-Jones interaction: between pairs of atoms i, j. The dispersion coefficient, B i is derived for the atom in the molecule via the Tkatchenko-Scheffler (TS) relation: 27 where V AIM i is the third radial moment of the partitioned atomic electron density: The corresponding quantities B f ree i and V f ree i may be computed and tabulated for free atoms in a vacuum (Table S1). To ensure that the Lennard-Jones potential has a minimum at the effective van der Waals radius of the atom-in-molecule, the coefficient of the repulsive term may be approximated via: 28 where the AIM radius, R AIM i , is again found by re-scaling a reference free atom radius: The R f ree i are free parameters to be fit to experiment, which we have found to be crucial if the force field is to reproduce condensed phase properties. Previously, we have fit these via parameter scans to liquid properties (densities and heats of vaporisation). 28 However, as discussed in the Introduction, a new set of parameters is required for each set of model hyperparameters, and so we employ here a more automated approach as set out later.
Transforming the A i and B i parameters to the σ i and i parameters of eq 1, we obtain (a full derivation is given in Supporting Information S1.1): and: Throughout this study, the Lorentz-Berthelot combination rules are used to derive ij and σ ij from the corresponding atomic quantities. Eq 9 results in atoms with more diffuse electron density having a larger Lennard-Jones σ i parameter but, as correctly pointed out previously, 35 all atoms of the same element have the same Lennard-Jones well depth (eq 10).
In that same study, the authors proposed that each atom should have a unique effective ionisation energy, derived from the partitioned atomic electron density, which affects the scaling relationship in eq 5. 35 An additional consideration is that the dispersion coefficient (B i ) should be viewed as an effective interaction, taking into account not only the dipole-dipole, but also higher-order (dipole-quadrupole etc) contributions to dispersion. It has been shown that physics-based derivations of B i tend to therefore be lower than the effective coefficients in common MM force fields. 30,40 To take both of these factors into account, we test here the effect of introducing additional flexibility in the definition of the dispersion coefficient: where α and β are global fitting parameters. This change has no effect on σ i , but i is now dependent on the diffuseness of the atomic electron density, measured by V AIM i (a full derivation is given in the Supporting Information S1.2): Note that β can be positive or negative. Thus, as summarised in Table 1, we can set α = 1 and β = 0 to retain the original QUBE force field model (e.g. model 0), or optimise α and The effects of these choices are discussed later.
An additional choice in force field design is whether to include Lennard-Jones parameters on polar hydrogen atoms, or to effectively redistribute them onto the neighbouring heavy atom 28 (see Supporting Information S1.3). Different choices are made for example in the design of the TIP3P and CHARMM modified water model, 58 while there is precedent for improved agreement with liquid data using Lennard-Jones parameters on polar hydrogen atoms. 59 While separate mapping parameters have been previously proposed for polar and non-polar hydrogen atoms, the effect on accuracy has not been directly explored, 35 and so we add this comparison to our list of model protocols to be tested (Table 1, model 4a).
A final consideration to be made when deriving the non-bonded parameter set is how to treat atoms with significant anisotropy in their electron density, such that an atom-centred point charge gives a poor approximation to the full QM electrostatic potential at the surface of the molecule. Such situations are commonly treated in MM force fields using an offcentre charge (or "virtual site"), as used in, for example, the TIP4P water model 60 and various treatments of lone-pairs and σ-holes in common force fields. [61][62][63] In keeping with our general force field philosophy, we previously proposed a method to derive the positions and magnitudes of off-site charges, where required, directly from the partitioned QM electron density. Full details are given elsewhere, 29 but in brief the charges are positioned to minimise the quantity: where V i QM is the electrostatic potential generated by the partitioned atomic electron density at a sample point i, V i M M is the corresponding quantity calculated using the point charge model, including any off-centre charge(s), and n is the total number of sample points (located between 1.4 to 2.0 times the van der Waals radius of the atom). By using the atomic, rather than the molecular, electron density, the method scales easily to larger molecules. To limit the search space, off-site charges were restricted to positions determined by the symmetry of the atom's bonding environment. 29 For example, for halogens bonded to a single atom, the search direction is along the bond vector, and for oxygen bonded to two neighbours, the search direction is along the bisector of the two bonds. Similar arguments can be made for positioning two virtual sites. 29 A similar protocol is followed in the current work. Key differences are discussed here and full implementation details are provided in the Supporting Information Section S2.
It is desirable to perform the optimisation of eq 13 directly in QUBEKit, but storage and imports of the full atomic electron density is inefficient. Instead, V i QM is now reconstructed from the QM atomic multipole moments, up to quadrupole order, which are readily extracted from either Chargemol (using the DDEC AIM method) 24,25 or the PSI4 software (using the MBIS AIM method). 53 For atoms with isotropic electron density, the atomic multipoles are low, and the QM electrostatic potential is well-described by an atom-centred point charge (monopole). For atoms with anisotropic electron density (defined here as Φ > 1 kcal/mol), eq 13 is minimised with respect to the virtual site charge and position, using the Scipy python package.  Table 1 shows the experiments performed in what follows to test the effect of including virtual sites on our force field model accuracy (models 5a-5e).
In previous works, new non-bonded force field parameters tend to be investigated in conjunction with the bonded parameters of standard transferable force fields, which may lead to incompatibility given the close inter-dependency between parameter types. Here, the benefit of assembling the force fields using the QUBEKit package, is that torsion parameters for rotatable bonds may be re-fit in a simple, automated manner for compatibility with the rest of the bonded and non-bonded parameters. Using a new interface between QUBEKit, ForceBalance 36 and TorsionDrive, 64 dihedral parameters of any freely rotatable bonds were optimised via a least squares minimisation of the root mean square error (RMSE) between QM and MM torsional potential energy surfaces. We use the same procedure for parameter fitting as the recent Open Force Field Parsley force field, 11 and full details are provided in Supporting Information Section S3.1. A new feature in this work is that we have modified the ForceBalance code to allow parameter fitting in the presence of local coordinate virtual sites. These virtual sites then inherit their exceptions and exclusions from the parent atom meaning they interact with the same rules as the parent. The code may be obtained from conda-forge or github (https://github.com/leeping/forcebalance), and is available from version 1.9.0 onward. Figure 1(c) demonstrates an example of bespoke dihedral parameter fitting for the molecule acetic anhydride, which has two virtual sites on the central oxygen atom. With the initial parameter set (which is incompatible with the new Lennard-Jones parameters and virtual sites), the RMS error is more than 3 kcal/mol, but following fitting it falls to just 0.12 kcal/mol. In fact, the average RMSE across all molecules in the 51 molecule test set (a total of 117 scans) after fitting was just 0.13 kcal/mol.

ForceBalance Workflow
As outlined in Figure 1(a), for each choice of force field design protocol (Table 1)

Results
Training a range of force field models identifies patterns in accuracy.
Using our new interface between the QUBEKit engine for QM-to-MM force field parameter mapping and the ForceBalance software for parameter tuning, we have trained a total of 15 different force field protocols to investigate the effects of choices made in force field design on model accuracy. Descriptions of the force field protocols are listed in full in Table 1, grouped roughly by the type of hyperparameter being investigated.
The first groupings investigate the choice of underlying QM methods used to compute the total electron density for AIM partitioning and subsequent non-bonded parameter derivation (the same QM method is also used for Hessian matrix calculation and dihedral scans to obtain the bonded parameters). As mentioned, our first implementation of QUBEKit 29 interfaced with the ONETEP DFT code, 56 which uses the PBE exchange-correlation functional, for non-bonded parameter derivation. However, this new version of QUBEKit is interfaced, via

QM Method
First, we investigate the use of two relatively high level QM methods. The ωB97X-D/6-311++G(d,p) method is used for bonded parameter derivation in earlier QUBE force fields, 29,65 as well as the OPLS-AA/M protein force field. 66 The B3LYP-D3(BJ)/DZVP method is used for bonded parameter fitting in the Parsley force field 11 and has been shown to give a good compromise between accuracy and computational expense for gas phase conformational energetics of small organic molecules. 67 It has been shown that the choice of DFT functional is not too critical in the reproduction of molecular electrostatic potentials, while larger basis sets afford some improvements in accuracy. 22 However, relatively little is known about how this choice translates to force field accuracy. Runs 0 and 1a reveal that this choice is not critical, with nearly identical training set accuracy obtained for both liquid densities (0.027 g/cm 3 ) and heats of vaporisation (0.77 vs 0.73 kcal/mol). It should be emphasised that this does not mean that the non-bonded parameters themselves are identical. Table 2 shows the final R f ree i parameters after ForceBalance training. There is a notable reduction in the radius of the N atom, using the B3LYP-D3(BJ)/DZVP method, and corresponding increase in both the polar and non-polar H atoms. This backs up the recent assertion that improving the accuracy of atomic charges is unlikely to improve calculation accuracy without a corresponding optimisation of Lennard-Jones parameters. 22 For comparison, in run 1b, we employed the HF/6-31G(d) method, which has been used historically for the fitting of force field ESP charges. 21 The argument goes that this method produces a fortuitous over-polarisation of the electron density in the gas phase to yield charges that are suitable for condensed phase modelling, and so we perform these electron density calculations in vacuum ( Table 1). The results are very similar to the two higher level QM methods in implicit solvent, but perhaps this is not too surprising, given the success of many force fields that are built from the HF/6-31G(d) method. However, as we will show, we can now do better with improved physics-based protocols.

Implicit Solvent Model
In previous versions of QUBEKit, we have computed the electron density using a minimal parameter solvent model, with the solute cavity defined by an isosurface of the electron density. 57 We argued that a solvent dielectric = 4 is appropriate, as it leads to charges that are polarised midway between vacuum and a dielectric medium of = 78 (i.e. water). 28 Gaussian09 provides the IPCM implicit solvent model, 68 which also builds the solute cavity from an isosurface of the electron density, and so we use this model as our default here.  Table 2 shows that the R f ree i parameters in run 2a tend to be lower than the others, which indicates that the minimum of the Lennard-Jones interaction should be closer to the atom for the less-polarised charge model. Compared to vacuum, we found that molecular dipole moments are scaled by a factor of 1.10x ( = 2) and 1.22x ( = 20), on average, indicating that a suitable range of condensed phase polarisation has been captured in these force field models by tuning the dielectric of the background implicit solvent ( Figure S4).

AIM Partitioning Method
Our previous version of QUBEKit relied on an implementation of the DDEC AIM approach in the ONETEP DFT code. 39 The DDEC AIM method is described elsewhere, 24 but in short it aims to construct the optimal weighting factors of eq 2 that result in approximately spherical atomic electron densities, so that its multipole expansion converges rapidly, whilst ensuring that the resulting charges are chemically reasonable. Alongside the observation that DDEC charges show excellent transferability between different conformations of the same molecule, 25 these properties make the DDEC AIM approach very promising for flexible force field design.
In the previous sections, we employed the latest DDEC6 AIM approach for the first time in organic molecule force field design, through the interface between Gaussian09 and Chargemol. In model 3a, we investigate the effect of switching to the older DDEC3 approach.
The accuracy is actually slightly better for DDEC3, with a decrease in both density and heat of vaporisation errors, compared to the baseline model 0. DDEC6 has several methodological improvements, which are summarised elsewhere 24 and have been shown to result in more robust convergence, lower computation times, lower ESP errors, and higher transferability, when compared to DDEC3. 24,25 However, most of these advantages will not be apparent in the small, relatively rigid, organic molecules, which lack buried atoms and are simulated here under standard conditions.
MBIS is similar in idea to the Hirshfeld family of AIM methods. 23 In this method the total electron density is partitioned onto a minimal set of s-type Slater functions, whose  As discussed in the Methods section, the default QUBE protocol for deriving the attractive part of the Lennard-Jones potential (B ij in eq 4) from QM tends to lead to dispersion coefficients that are lower than those used in common effective force fields 40 and energy well depths that are constant across atoms of the same element. 35 Model 4b in Table 1 investigates the effects of relaxing this protocol, by introducing two new global, variable parameters (α and β) into the fit, thus moving away from the default Tkatchenko-Scheffler (TS) approach. As expected, a reduction in training set errors is achieved, relative to the default model 0, with these extra fitting parameters. Table 2 shows that this is achieved with α > 1 and β > 0, both factors that act to increase the well depth for atoms with more diffuse electron density (eq 12). Figure 2 plots the dispersion coefficients (summed over all atoms in the molecule) for all molecules in the larger test set, using the identical Lennard-Jones scheme (from model 5b, see later), against the corresponding quantities extracted from the Parsley force field. Unlike previous protocols (see our previous work, 40 and Section S5.2),

Lennard-Jones Parameters
there is a very good correlation between the two force fields, albeit with a consistent offset.
While the Parsley parameters are extracted from a library, which in turn has been fit to condensed phase properties, and ours are derived from QM calculations specifically for the molecule under question, the convergence of the two approaches is encouraging.
In addition, we investigate the question of whether to include Lennard-Jones parameters on polar hydrogen atoms, or to set them to zero and effectively absorb them onto the neighbouring heavy atom (Section S1.3). 28 In Table 1, we show that force field protocol 4a, which has no Lennard-Jones parameters on polar hydrogens (denoted 'H0'), has a similar density error, but higher heat of vaporisation error (0.90 kcal/mol) compared to the other protocols. Interestingly, this model has a much lower R f ree i for nitrogen than the other models, but the low density error indicates that this does not propagate through to too short hydrogen bonding distances in condensed phase simulations. Nevertheless, we conclude that explicitly including Lennard-Jones parameters on polar hydrogen atoms has accuracy benefits, and we continue to use this approach in the following force field models.

Virtual Sites
Figure 1(c) shows an example benefit of including just a small number of virtual sites on a molecule in terms of the reproduction of the QM electrostatic potential (in this case up to quadrupole order). In fact, considering a larger test set of 51 molecules, we found that 33 molecules had electron density anisotropy above our set threshold (1 kcal/mol, eq 13) for at least one atom. As expected from our previous work, 29  However, a more pertinent question is whether a force field model that more accurately captures the QM electrostatic potential also results in more accurate prediction of condensed phase properties. We have therefore trained a series of force field protocols that include virtual sites in the description of the electrostatics (models 5a-5e). Table 1  Relative training set accuracy is retained in the test set To test the accuracy of the designed force fields on molecules outside the training set, we ran simulations of the condensed phase properties of a separate test set of 51 molecules containing H, C, N and O atoms only. The force fields brought through for study were models 0, 1a, 5b and 5d. The first two were chosen as our simplest (but also amongst the least accurate) protocols, differing only by the choice of underlying QM method. In both cases, we see a drop in overall accuracy, compared to the training set, which is to be expected for the larger, more complex molecules found in the test set ( Figure S3)  Table 2). It is difficult to pinpoint reasons for improvement in accuracy, since the physical properties are derived from many competing effects. However, Figure 4 shows some of the force field parameters for molecules that show differences in performance between our best model (5b), and Parsley. Figure 4 We can compare the accuracy of the protocols developed here with other force fields tested on the same or similar test sets. Using the RESP2 charge assignment method, which is based on fitting to the QM electrostatic potential in implicit solvent, with optimised Lennard-Jones parameters, the density and heat of vaporisation errors on an identical test set are 0.024 g/cm 3 and 1.67 kcal/mol. 22 Thus, substantial improvement in energetics is obtained in the current study, at the expense of a slight degradation in liquid density prediction. In that study, only a subset of ten Lennard-Jones parameters (the well depth and radius for each of C, O, N, H and polar H) were optimised for efficiency, though this is still more parameters than our most complex protocol (model 5b has seven adjustable parameters). With our automated methods for force field derivation and training, it appears that further accuracy gains can be expected by moving in future to polarisable and beyond-Lennard-Jones models.

Discussion and Conclusions
Force field design is a lengthy process, often requiring large teams of researchers working over a period of many years. Design decisions are typically made early in the process, and any inaccuracies stemming from these assumptions propagate through to the final force field.
Even the minimal Parsley force field has a set of 35 Lennard-Jones parameters, 11 and so large training sets are required to cover sufficient chemical space. Repeatedly performing the full training process to investigate alternative design decisions would substantially slow down the time to science. Furthermore, as we inevitably move to more complex and accurate functional forms, the size of the parameter space and danger of over-fitting will only become worse.
To solve this problem, we and others have advocated making use of QM-to-MM mapping procedures to significantly reduce the number of parameters that require fitting to experi- We have used this design and train cycle to answer a number of fundamental questions about force field design protocols. It was shown that the details of the underlying QM method and implicit solvent model have negligible effect on the accuracy of condensed phase modelling with the designed force fields. That is not to say that any set of charges gives identical results, but rather that as long as the charges and Lennard-Jones parameters are co-optimised, the overall accuracy of the force field is insensitive to the particular choice of model. In this case, future efforts should focus on the cheapest QM methods that continue to give reliable results. It is well known that there is no unique method to partition the total molecular electron density into atom-centred basins. We have therefore investigated two variants of the DDEC AIM method, and the MBIS approach. Surprisingly, the older DDEC3 method showed improvements over DDEC6, but given the improvements in the latter in robustness and transferability of the charges, we continue to use this approach.
The MBIS method provides a useful alternative AIM approach, available through the open source PSI4 software package. Once atomic electron densities are assigned, the details of the Lennard-Jones parameterisation also makes a difference to force field accuracy. Based on our data, we advocate setting non-zero Lennard-Jones parameters on polar hydrogen atoms, and re-scaling the strength of the QM-derived dispersion parameter. The latter approach gives not only an improvement in accuracy, but brings QM-derived and Parsley empirical Lennard-Jones parameters into closer agreement. Finally, consistent improvement in both training and test set accuracy is achieved by introducing a small number of off-site charges to model anisotropy in the QM electron density. It is encouraging that improving the underlying physics of the force field model, with minimal costs to run time, results in such improvements.
In the current study, we have focused on liquid density and heat of vaporisation as measures of force field accuracy. In future, additional properties, such as enthalpies of mixing, 71 could be used to extend the confidence in the range of application of the force fields.
More complex properties, such as free energies of hydration or even protein-ligand binding could be included (at least for testing), but would require infrastructure improvements to allow for virtual sites. The current study gives confidence that this investment of time would be worthwhile.
A notable feature of the current study is the number of force field models that can be rapidly designed. We have generated here 15 different models, with varying assumptions, which can be contrasted with the handful of general purpose small molecule force fields that are otherwise available. This opens up the possibility of new research into consensus property predictions using force field ensembles, which has been shown to be advantageous, for example, in recent protein-ligand binding free energy studies. 72 More broadly, we envisage QM-to-MM mapping potentials, such as these, providing synergy with traditional force fields. For example, our conclusions concerning force field design protocols can be fed into large-scale fitting efforts, such as the Open Force Field Initiative, to focus efforts in areas where accuracy improvements are expected. This will be particularly important as the community moves towards more advanced functional forms, for which the number of parameters to be fit will only increase. With the advent of machine learning models for parameterising force fields, 73 the force fields developed here could be used to provide regularisation of the parameter fits, to avoid unphysical predictions by the models. Or conversely, machine learning models could be trained to reproduce the outputs of our QM-to-MM mapping procedures to substantially reduce the computational cost of the parameterisation stage. 74 All software required to derive the designed force fields is freely available at https://github.com/qubekit/QUBEKit.