Implementation of the QUBE Force Field for High-Throughput Alchemical Free Energy Calculations

QUBE High-Throughput Alchemical Energy The quantum mechanical bespoke (QUBE) force field approach has been developed to facilitate the automated derivation of potential energy function parameters for modelling protein-ligand binding. To date the approach has been validated in the context of Monte Carlo simulations of protein-ligand complexes. We describe here the implementation of the QUBE force field in the alchemical free energy calculation molecular dynamics simulation package SOMD. The implementation is validated by computing relative binding free energies for two congeneric series of non-nucleoside inhibitors of HIV-1 reverse transcriptase using QUBE and AMBER/GAFF force fields. The availability of QUBE in a modern simulation package that makes efficient use of GPU acceleration will greatly facilitate future high-throughput alchemical free energy calculation studies. Abstract The quantum mechanical bespoke (QUBE) force ﬁeld approach has been developed to facilitate the automated derivation of potential energy function parameters for modelling protein-ligand binding. To date the approach has been validated in the context of Monte Carlo simulations of protein-ligand complexes. We describe here the implementation of the QUBE force ﬁeld in the alchemical free energy calculation molecular dynamics simulation package SOMD. The implementation is validated by computing relative binding free energies for two congeneric series of non-nucleoside inhibitors of HIV-1 reverse transcriptase using QUBE and AMBER/GAFF force ﬁelds. The availability of QUBE in a modern simulation package that makes eﬃcient use of GPU acceleration will greatly facilitate future high-throughput alchemical free energy calculation studies.


Introduction
The ability to accurately predict protein-ligand binding affinity is invaluable in the early stages of drug discovery. High-throughput alchemical free energy calculations are an attractive tool for this task, enabling rigorous calculation of binding free energies. However, accuracy remains limited by the description of interatomic interactions and the sampling of conformational space. [1][2][3][4] The potential energy surfaces of protein-ligand complexes are almost always described by molecular mechanics (MM) force fields, of which AMBER, 5  Recently, an alternative approach to biomolecular force field design has been proposed, named the QUantum mechanical BEspoke (QUBE) force field, in which virtually all potential parameters are derived specifically for the molecule under study directly from a small number of QM calculations. QUBE shares its functional form with the OPLS force field, and so is rapid to evaluate in the context of alchemical free energy calculations. Full details may be found elsewhere. 9,10 In brief, non-bonded (charge and Lennard-Jones) parameters of the QUBE force field are derived from atoms-in-molecule partitioning of the ground state QM electron density, 11,12 in particular, employing the Tkatchenko-Scheffler relations for van der Waals interactions. 13 QUBE bond and angle parameters are derived from the QM Hessian matrix, using the modified Seminario method, 14,15 while anharmonic dihedral parameters are fit to relaxed QM torsion scans. 9 Small molecule QUBE force fields may be derived using the QUBEKit python package, and they have been extensively validated against experimental liquid properties. 9 Atoms-in-molecule protocols are available as part of the ONETEP linearscaling density functional theory software, 16 and hence QUBE non-bonded parameters may be readily derived for entire proteins comprising thousands of atoms. These parameters have been supplemented by compatible libraries of bonded parameters, and simulations of protein dynamics using the resulting QUBE force fields have been validated against experimental NMR observables. 10 To date, the QUBE force field has been used to compute absolute binding free energies of a series of benzene derivatives to the L99A mutant of T4 lysozyme 17 and relative binding free energies of several flexible inhibitors of p38α MAP kinase. 18 In both cases, mean unsigned errors (MUEs) in protein-ligand binding free energies of under 1 kcal/mol were reported.
Such accuracy was shown to be broadly similar to that of the widely-used OPLS force field in these cases. In the initial development of QUBE, the OPLS functional form has been retained for compatibility with existing MM software (for example, these studies used the MCPRO software 19 ). However, now that a baseline accuracy has been established, future development strategies will target rapid and systematic evolution of the force field functional form. Such a strategy involves identifying key mappings between QM observables (such as the electron density) and force field parameters. Examples include the automated addition of off-center charges to account for anisotropic electron density, 9 which has not yet been thoroughly explored in protein-ligand binding studies, 11 and higher-order dispersion terms to move beyond the dipole-dipole r −6 interaction. 20,21 For rapid testing of these new force fields, it is desirable to interface with free energy software that can be readily adapted to new force field functional forms whilst achieving efficient performance on modern computing hardware. Options for high-throughput alchemical free energy simulations include Schrödinger's commercial FEP+ package, 22 as well as AMBER 5 and GROMACS. 8 However, packages such as these tend to be developed specifically for a given force field and file format, reducing interoperability and ease of comparison. Indeed, a recent study of relative hydration free energies using different simulation packages, highlighted the careful steps required to reproduce this quantity across simulation packages. 23 For this reason, here we implement QUBE in the Sire molecular simulation framework 24 which includes the SOMD molecular dynamics engine for free energy calculations. SOMD interfaces with the OpenMM toolkit for GPU acceleration. 25 We also make use of the BioSimSpace library which facilitates system setup and interoperability between a range of biomolecular simulation packages, such as AMBER, GROMACS, and CHARMM, to facilitate preparation of QUBE inputs. 26 SOMD has been used within the Sire molecular simulation framework and successfully applied to alchemical free energy studies on a range of drug-like fragments, carbohydrates and host-guest systems. [27][28][29][30][31][32][33][34][35][36][37] The automated setup and processing of alchemical free energy calculations using Sire, BioSimSpace, SOMD and OpenMM has recently been implemented in Cresset's Flare package, 38 and benchmarked against 220 ligands bound to 14 protein targets, with accuracy comparable to previous reports. 3,8,22 Thus, the SOMD framework provides a promising basis for implementation and future benchmarking of alchemical transformations using the QUBE force field. In this paper, we develop and distribute the file parsers that allow users to run QUBE simulations of protein-ligand complexes in the Sire and OpenMM molecular simulation frameworks. We further demonstrate the use of the QUBE force field in alchemical free energy simulations with focus on datasets created from extensive work on the HIV-1 reverse transcriptase (HIV-1 RT) protein target by the Jorgensen group. [39][40][41][42][43] HIV-1 is one of two strains of HIV which cause AIDS; the most advanced stage of the HIV infection. The rapid replication of HIV-1 can lead to errors in viral replication and virus evolution, causing difficulties for effective vaccine development. 44 There are three viral enzymes that are targeted for drug development, one of which is HIV-1 reverse transcriptase (RT). HIV-1 RT is an asymmetric heterodimer with two subunits: p66, the larger of the two subunits, which contains the active sites of the two enzymatic functions of RT, and p51, the smaller subunit which has a structural role. 44 HIV-1 relies on RT to copy its single-stranded RNA genome into a double stranded DNA copy before it can be integrated into the host cell genome; preventing this reduces viral replication. RT first binds to DNA or RNA, followed by the binding of deoxynucleoside triphosphate (dNTP) in a two-step process, to form a tertiary complex. This induces a conformational change, enabling the 3'-hydroxyl of the elongating strand to attack the α-phosphate of dNTP, before the elongated DNA is released from RT. 45 The molecules investigated here are potential non-nucleoside inhibitors of HIV-1 RT (NNRTIs), which block polymerase activity of RT. [39][40][41][42] NNRTIs are non-competitive inhibitors, binding to a hydrophobic allosteric pocket which is adjacent to the polymerase active site, causing a conformation change in the enzyme, thus preventing the substrate from binding. 44 The allosteric site consists of 15 residues in the p66 subunit (L100, K101, K103, V106, T107, V108, V179, Y181, Y188, V189, G190, F227, W229, L234, and Y318) and one residue in the p51 subunit (E138). As shown in Figure 1(A), the binding pocket accommodates well-packed aryl-aryl interactions with Tyr181, Tyr188, and Trp229. 40 There is room within the hydrophobic allosteric site for potential NNRTIs to venture toward Pro95, located above Tyr181 and below Trp229. Jorgensen and co-workers have worked on the HIV-1 complex for over 20 years, developing bicyclic NNRTIs targeting HIV-1 RT; initially pursuing diarylamines and pyrimidinyl-and triazinyl-amines. 40,41,46,47 In an effort to replace reactive cyanovinyl groups present in early designs, the group used free energy perturbation (FEP) calculations to scan viable scaffolds, and found indoles, indolizines and benzofurans to be the most viable options (Figure 1(B), group 2). 39 Synthetic studies and assays were conducted, and structural characterisation with X-ray crystallography confirmed their findings, showing good positioning in the allosteric binding site. This work uncovered novel, potent NNRTI agents that displayed low cytotoxicity and good solubility.
Following their research with indolizine variants, 39 Jorgensen and co-workers investigated bicyclic replacements to overcome lower potency towards the Y181C viral mutation, subsequently focusing on naphthyl analogues. 40 Using X-ray crystal structures for various catechol diethers bound to wild type (WT) and other HIV-1 RT variants, the group used de novo Here, we chose 10 compounds from the 2-naphthyl dataset 40 (group 1, Table 1) and 12 compounds comprising indoles, indolizines, and benzofurans from earlier work 39 (group 2, Table 2). We focused on these two datasets as group 1 exemplifies small side chain transformations, whilst group 2 covers a much wider potency range, including heterocyclic substitutions that are expected to provide a challenge for force field design. The general structure of each group is displayed in Figure 1(B). For each dataset we can compare computational findings with experimental inhibitory activity measured in MT-2 cell assays, which are presented in Tables 1 and 2. 39,40 In what follows, we report the derivation of force field parameters for the protein and small molecules with the QUBEKit software, and describe our new interface with the SOMD package. We compute relative binding free energies for each of the inhibitors to the HIV-1 RT target, described above, and compare our data with experiment and with the widely used GAFF2 and AMBER force fields.

Protein and Ligand Setup
Simulations for groups 1 and 2 were based on the initial crystal structures with PDB codes 5ter and 4mfb, respectively. The protein was pre-processed using the visualization software Maestro. 48 All residues further than 15Å from the ligand binding site were removed, and dangling bonds were capped using NME and ACE groups. The final system sizes were 376 and 384 residues for groups 1 and 2, respectively. The protein underwent a short minimization and equilibration with backbone atoms restrained. Starting from the ligands in the two crystal structures (that is, compound 1c for group 1, and compound 2a for group 2), initial structures for all 22 ligands were built, and hydrogens added using Maestro.
For direct comparison with QUBE on this data set, we employ the widely-used Am-berff14sb force field for the protein and GAFF2 for the ligands (hereafter referred to as AMBER). BioSimSpace was used to parameterize the proteins and ligands, generating AM-BER/GAFF2 parameters and their charges whilst utilizing the AM1-BCC charge model for the ligands. Each ligand was combined with the corresponding protein, and both unbound ligands and the complexes were solvated using the TIP3P water model with cubic box sizes of 26Å and 88Å respectively. Short equilibration simulations were run using BioSimSpace for every solvated system to produce files ready for use in free energy calculations. Non-bonded parameter derivation was performed with the linear-scaling DFT code, ONETEP, 16 using previously reported protocols. 9 The ground state electron density was computed using an implicit solvent model with a dielectric of 4 to model induction effects in an effective manner. 11 The DDEC module implemented in ONETEP was used to partition the electron density and assign atom-centered point charges and atomic volumes. 9,12,52,53 Partitioning was performed with an IH to ISA ratio of 0.02. Lennard-Jones parameters were derived by QUBEKit from the atomic volumes using the Tkatchenko-Scheffler method. 11,13 No off-site charges were used in this work. 9

Ligand Preparation with QUBEKit
Torsion parameter fitting follows the general methods used in previous studies, 9,18 though here we employ an interface between QUBEKit and the TorsionDrive package, 54 which improves the quality of both QM and MM torsion scans through its recursive wavefront propagation algorithm. QM torsion scans use the same functional basis set as used to derive the bond and angle parameters above. To aid QM optimization and save computational expense, torsional scans were conducted on core fragments of the ligand sets ( Figure 1(C)). The iterative fitting algorithm is described in the Supporting Methods. Non-flexible torsional parameters (e.g. for ring systems) were taken from the OPLS force field as provided by the LigParGen 50 web server. For group 2, torsions involving the cyano group (N-C-C-C) were assigned erroneous atom types by LigParGen, and so these parameters were manually set to zero. Final force fields were output in xml format. A full list of commands for QUBEKit can be found on our Github page (https://github.com/qubekit/QUBEKit).

Protein Preparation with QUBEKit and Sire
The truncated protein target underwent parameterization using the QUBE force field. ONETEP calculations were performed to obtain the ground state electron density, atomic charges and atomic volumes, in the same way as the ligands, as previously described. 18 To facilitate set-up of QUBE simulations for proteins, the software QUBEKit-pro has been developed and is used here to generate OpenMM xml files from pdb and ONETEP output files. Using QUBEKit as a base has the advantage that most features can be easily applied to the protein, allowing for charge checking and symmetrization, for example.
QUBEKit-pro builds the xml by first reading the PDB file to obtain the full topology of the protein or fragment. This provides atomic positions and connectivity, as well as some other information, such as the amino acid sequence and whether the protein is split into subunits. At this stage, certain groups of atoms are picked out for later symmetrization such as hydrogen atoms on the same methyl or amine groups.
With the structure read in, a parametrization step is performed. A stored, general protein xml file contains bond, angle and torsion parameters, which have been generated specifically for compatibility with the QUBE force field 10 and are available for all atoms in standard residues, including NME/ACE caps. This general xml is used to map parameters to the protein in question, using the now stored structure. As described above for the ligands, atomic charges and volumes are extracted from the ONETEP output file, symmetrized (if required), and used to calculate the Lennard-Jones parameters. 11 QUBEKit is then used to write final pdb and xml files for use with OpenMM. If the protein consists of multiple chains, each molecule is returned as a separate file. Since each atom in the protein is in a unique environment, and therefore has unique charge and Lennard-Jones parameters, each atom in QUBEKit-pro is assigned a unique type.
Molecular systems in Sire can be created using AMBER style input files. The molecule is described primarily with two files: the prmtop (or prm7) file that holds all the parameters and the inpcrd (or rst7) that contains the coordinates of the atoms. These files are generated during the parameterization of the molecule of interest with the AMBER force field. Given that all the molecules are now parameterized with the QUBE force field, functionality in Sire was extended to include new features that support parsing of the new pdb and xml input files and support the OPLS-type potential energy functional form. For the former, an algorithm was implemented that reads the parameters and the coordinates from the xml and pdb input files respectively, and returns formatted AMBER files (prm7/rst7) that contain exactly the same information and can be parsed by SOMD. Geometric combination rules have also been implemented in SOMD to support OPLS-style force fields. Further information and benchmarking of single point energies in OpenMM and SOMD are provided in the Supporting Methods.

Free Energy Simulations
BioSimSpace was used to generate files for free energy calculations of the ligands in bound and unbound states. Free energy simulations were run with SOMD using both AMBER and QUBE force field parameterization methods for comparison. Perturbation maps for each HIV-1 RT data set were constructed manually ( Figure S3). For the 10 ligands used in the group 1 data set the free energy map had 32 transitions, whilst the 12 group 2 ligands comprised 42 transitions. In both cases, multiple cycle closures allow assessment of the convergence of the computed free energies.
Each perturbation was carried out in two independent simulations, one in each direction.
Though this increased the overall computational cost, it provided an opportunity to assess the precision of the calculations and check for hysteresis. Each bound and unbound simulation was divided into 11 regularly spaced λ windows. The energy of the system was minimized for 1000 cycles by using the steepest descent method and each λ window was run for a total of 4 ns, using a time step of 2 fs. The first 5% of each simulation trajectory was discarded as equilibration. Free energy changes were calculated from the output using MBAR and thermodynamic integration (TI) methods, as implemented in pymbar which is integrated into the Sire application analyse freenrg. Only MBAR results are presented here for simplicity, though TI data were checked to be similar (see Table S2, for example). The collection of MBAR relative binding free energies was then processed by the software FreeEnergyWorkflows 27 to produce the free energy estimates, and associated errors, reported in this manuscript.
All parameterization and free energy protocols, example input files and output data may be found at <insert github link>.

Results
We begin analysis with group 1, the catechol diether that incorporates a 7-cyano-2-naphthyl substituent. Figure 2(A) shows an overlay of compound 1c from MD simulations using the QUBE force field, with the corresponding crystal structure. As expected, the cyano group projects out below Trp229 into a solvent-exposed channel. The 2-napthyl group maintains close aryl-aryl contacts with Tyr188 and Trp229. 40 Tyr181 has switched orientation from a face-to-face interaction with the catechol diether ring of the ligand, to form closer contact with the 2-napthyl group. However, Figure S4 (Table 1). With R 2 = Me, the Me and Cl substitutions at R 1 are favored over F (1e < 1c < 1b), an ordering which is recapitulated by QUBE while AMBER strongly favors compound 1c. Experimentally, the two compounds (1a and 1d) with R 2 = Cl are equipotent and less strongly bound than 1e. In this case, QUBE obtains the correct sign of the transformation for 1a, but slightly favors compound 1d. In contrast to the promising results above, the two compounds (1i and 1j) with R 3 = F are strongly over-bound with the QUBE force field. The small potency gain expected from the R 3 = H to F transformation (1e to 1j) is modelled well with AMBER, but overestimated by QUBE by around 2 kcal/mol. Figure S5 shows significant movement of the catechol diether group of compound 1j towards the space between Tyr181 and Lys103, which represents a larger change in conformation than was observed for the other inhibitors.
Compound 1i is expected to be the most weakly bound of the 2-napthyl ethers investigated here, but is strongly over-bound by both AMBER and QUBE. Interestingly, we did not include a R 2 = F substituent in our fragment torsion scans (Figure 1(C)), and it may be that the electronegative F atom over-stabilizes the bound conformation in combination with both the QUBE force field parameters and the transferable GAFF library. It is also worth noting that the 2-napthyl ether, with R 1 = R 2 = H, has been shown to adopt an alternative binding mode to the one shown in Figure 2. 40 Compound 1i could be thus expected to pose a challenge if the small R 2 = F substituent causes a large conformational change in the ligand. Table S3 reports a series of repeat runs for several transformations involving compounds 1i and 1j. We do not observe very large differences between repeated runs, but do see large hysteresis between forward/reverse transformations (close to 2 kcal/mol in some cases), which is again indicative of sampling inadequacies.
It has been hypothesized that bulkier groups at the R 1 position in the 2-napthyl ethers might be accommodated, and confer extra benefit in the common Y181C mutant viral strain of HIV-1 RT. 40 Hence, compounds 1f-1h were added to our benchmark data set. Both AMBER and QUBE force fields strongly over-estimate binding in these cases. There is evidence of a change in binding mode of these bulkier ligands (Figure 2 (Table S2), it may be that significantly longer simulation times and/or enhanced sampling are required here. 57 We turn now to group 2, where the 2-napthyl group is replaced by a series of indolizines, indoles, and furans, which exemplify the challenge of optimizing heterocycles in early stage drug discovery. A crystal structure is available for compound 2a, 39 and this is shown in  As before, we have computed the relative binding affinities of each of the group 2 compounds in Table 2 with both the AMBER and QUBE force fields, and compared with available assays. 39   Finally, series of benzofurans (2h and 2i) and indoles (2j-2m) were investigated. Both force fields successfully recover the drop in potency (relative to the indolizines), which is encouraging for future prospective heterocycle scans. QUBE slightly over-estimates the affinity of the two furans, and under-estimates that of the indole with a methyl substitution at R 1 (2k), relative to the other indoles, but is in otherwise very good agreement with experimental trends.

Discussion and Conclusions
In this paper, we have described an interface between the QUBEKit force field engine, and the Sire molecular simulation framework, for the calculation of alchemical relative binding free energies. As an example application, we have retrospectively analyzed the binding of 22 small molecule allosteric inhibitors of HIV-1 reverse transcriptase. As discussed earlier, both AMBER and QUBE perform well for small substitutions on the 2-napthyl of group 1, but uniformly over-bind compounds with bulkier hydrocarbon chains and (to some extent) those with halogens in the R 3 position. With the significant caveat that EC 50 is not a direct measurement of binding, we can estimate experimental binding free energies from the available data. The mean unsigned error in QUBE predictions, relative to experiment, is then 1.85 kcal/mol, which is slightly higher than AMBER (1.66 kcal/mol). It has been shown previously, in the context of Monte Carlo simulations, that enhanced sampling is required to mitigate the effects of quasi-ergodic sampling for some bulky inhibitors in this particular binding site. 58 We propose that the present group 1 set may represent an interesting test case for such sampling protocols.
In contrast, the group 2 set seems to represent a 'well-behaved' test case for free energy methods. Overall, compared with group 1, the conformational dynamics of group 2 compounds retain much stronger agreement with the available crystal structure, which may help to explain the relatively better performance of both QUBE and AMBER on this set.
Relative to experiment, the mean unsigned errors are 1.05 and 1.08 kcal/mol for the QUBE and AMBER force fields, respectively. In fact, we may also compare these results with the CHARMM36/CGenFF force field, which was employed as part of a λ-dynamics protocol by Vilseck et al. 55 For similar indole and indolizine test sets, they report mean unsigned errors of 1.13 and 0.70 kcal/mol, respectively, though note that they did not directly transform between these scaffolds as we have done here. Thus, for both transferable (AMBER and CHARMM) and bespoke (QUBE) force field parameter sets, the errors reported on this set are in agreement with the current consensus on the accuracy of free energy methods in computer-aided drug design. 8,22,38 Despite the relatively similar overall performance of AMBER and QUBE on these datasets, Figure S7 shows that there is little similarity between predictions of individual relative binding free energies using the two force fields. This is perhaps to be expected given the fundamentally very different approaches to parameter derivation. While, at first sight, there is no particular reason to use QUBE over, for example, the AMBER or CHARMM force fields at this time, our bespoke parameter derivation methodologies do offer the potential for substantial improvements in accuracy down the line. By deriving as many force fields parameters as practical directly from QM, rather than fitting to experiment, the process of force field design becomes an exercise in accurately mapping QM data onto physically-motivated parameters and functional forms. 9,59,60 We emphasize that in developing the QUBE nonbonded parameters for protein-ligand complexes such as these, only seven parameters have been directly fit to experiment (the van der Waals radii of seven elements), with the remainder being derived from QM. Thus, future parameter and functional form updates should be relatively straightforward to automate. In contrast, similar adjustments of the functional forms of the aforementioned transferable force fields would be a substantial undertaking, requiring a complete re-fit of all force field parameters to experimental data (admittedly a process that is gradually becoming more achievable through efforts such as the Open Force Field Initiative 61 ). As an example, a polarizable force field, supplemented by higher order

Dihedral Parameter Fitting
An iterative fitting algorithm is employed to optimize QUBE dihedral parameters. First, individual constrained QM optimizations were performed in 15 • increments from -165 • to 180 • using a newly developed interface between Gaussian09 and TorsionDrive. In the first iteration of the fitting process, a BFGS algorithm is used by QUBEKit to minimize the objective function: 1 where n is the number of sampling points, V j is the dihedral parameter to be fit and

Generation of Sire Input Files
This section describes further information about the implementation of the QUBE force field in the molecular dynamics simulation package SOMD. As described in the main text, QUBEKit-pro outputs the force field in an OpenMM style xml file. A xml file stores data in a structure that is both machine-and human-readable. Here, the data is information about each atom of the parameterized molecule, for example, its charges and Lennard-Jones parameters, as well as information about the atom types and the connectivity. To pass this information to Sire and SOMD, an algorithm was implemented, that is graphically described in Figure S1. Figure S1: Graphical representation of the algorithm that reads xml and pdb files and returns AMBER files with the stored information.
The parameters from the xml files are parsed, stored as dictionaries and then assigned as properties to an editable molecule class. Once all the properties are set to match the Sire format, the prm7 file can be exported. The coordinates are read using an internal function of Sire and then saved in the corresponding AMBER rst7 format. Finally, the new AMBER files for the QUBE-parameterized molecule, can be loaded into Sire and used with the same protocols/methodology as previously.
To test the implementation of the algorithm, single point energies were compared for a set of molecules. Ten molecules in vacuum, that were parameterized with QUBE, were loaded both with Sire and with OpenMM. Single point energies were calculated for these conformations using both methods. Figure S2 shows perfect agreement between the energy sets for each molecule, meaning that all the information that was read with the new algorithm is parsed into Sire correctly. The functional forms of force fields are usually quite similar in terms of how bonded and non-bonded interactions are described. However, they are not completely identical.
For example, the combinination rules, which are a set of equations that describe the van der Waals interactions between dissimilar atoms, can differ between force fields. They are expressed as functions of the σ and Lennard-Jones parameters. Let us suppose that atoms i and j are described by σ i and i and σ j and j respectively. For the AMBER force field the Lorentz-Berthelot combination rules are used. 3,4 This means that the σ i and i parameters for the interactions between dissimilar atoms are given by the following equations: However, in this work, an OPLS-AA style potential energy function is used. In order to use the force field function correctly, the OPLS-AA or geometric combination rules should be used: Despite the fact that the option of using geometric combination rules existed in Sire, since the OpenMM API is used, SOMD only supported the Lorentz-Berthelot combinination rules. Consequently, the need to extend the functionality in SOMD to support geometric combination rules is apparent. The SOMD source code was modified in a way that the user can now choose which type of combination rules will be used. In order to test and calibrate the implementation, energetic components of the HIV-1 RT protein are computed both with OpenMM and Sire using the geometric combining rules. Table S1 shows that the energies agree, demonstrating that both the parsing of the QUBE force field to SOMD and the implementation of the geometric combination rules were completed successfully.  Figure S4: MD simulation snapshots of Group 1 compound 1c (orange) compared to the crystal structure (pdb:5ter, blue) to demonstrate the conformational dynamics observed. In snapshot B, the linker adopts a conformation very close to the crystal structure. In snapshots C and D, Tyr181 forms face-to-face and edge-to-face interactions respectively, with the catechol diether group. Figure S5: MD simulation snapshots of Group 1 compound 1j (orange) compared to the crystal structure of 1c (pdb:5ter, blue) to demonstrate the conformational dynamics observed. During the dynamics, there is a movement of the fluorinated catechol diether towards Lys103, and a corresponding rotation of the cyano group towards Phe227. Figure S6: MD simulation snapshots of Group 2 compound 2a (orange) compared to the crystal structure (pdb:4mfb, blue) to demonstrate the conformational dynamics observed. download file view on ChemRxiv manuscript-NelsonBariami-SI.pdf (1.71 MiB)