Fast Equilibration of Water between Buried Sites and Bulk by MD with Parallel Monte Carlo Water Moves on GPUs

Molecular dynamics (MD) simulations of proteins are commonly used to sample from the Boltzmann distribution of conformational states, with wide-ranging applications spanning chemistry, biophysics, and drug discovery. However, MD can be inefficient at equilibrating water occupancy for buried cavities in proteins that are inaccessible to the surrounding solvent. Indeed, the time needed for water molecules to equilibrate between bulk solvent and the binding site can be well beyond what is practical with standard MD, which typically ranges to hundreds of nanoseconds to low microseconds. We recently introduced a hybrid Monte Carlo/MD (MC/MD) method, which speeds up the equilibration of water between buried cavities and the surrounding solvent, while sampling from the thermodynamically correct distribution of states. While the initial implementation of the MC functionality led to considerable slowing of the overall simulations, here we address this problem with a parallel MC algorithm implemented on graphical processing units (GPUs). This results in speedups of 10-fold to 1000-fold over the original MC/MD algorithm, depending on the system and simulation parameters. The present method is available for use in the AMBER simulation software.


ABSTRACT
Molecular dynamics (MD) simulations of proteins are commonly used to sample from the Boltzmann distribution of conformational states, with wide-ranging applications spanning chemistry, biophysics, and drug discovery. However, MD can be inefficient at equilibrating water occupancy for buried cavities in proteins that are inaccessible to the surrounding solvent. Indeed, the time needed for water molecules to equilibrate between bulk solvent and the binding site can be well beyond what is practical with standard MD, which typically ranges to hundreds of nanoseconds to low microseconds. We recently introduced a hybrid Monte Carlo/MD (MC/MD) method, which speeds up the equilibration of water between buried cavities and the surrounding solvent, while sampling from the thermodynamically correct distribution of states.
While the initial implementation of the MC functionality led to considerable slowing of the overall simulations, here we address this problem with a parallel MC algorithm implemented on graphical processing units (GPUs). This results in speedups of 10-fold to 1000-fold over the original MC/MD algorithm, depending on the system and simulation parameters. The present method is available for use in the AMBER simulation software.

INTRODUCTION
Buried protein cavities may accommodate water molecules, and it is generally more favorable to fill a polar cavity with water than to leave it unoccupied. 1 Water exchange between bulk and cavities can occur on a very long time scale, depending on the nature of the free energy surface, and exchange can be particularly slow if these cavities are deeply buried. 2 Furthermore, large conformational changes may be required to allow the water to exchange with the bulk. Standard molecular dynamics (MD) simulations, which integrate Newton's equations of motion, are computationally demanding and can require a very long time to account for large conformational changes, adding to the challenge of efficiently sampling water exchange in buried cavities. 3 Thus, a conformational ensemble from MD that does not efficiently sample the exchange of water between bulk and buried cavities may not be with concordance with the proper Boltzmann distribution associated with the conditions (e.g., solvent composition and temperature) of the simulation. Moreover, a simulation might appear to be converged, while in reality it is stuck in a local minimum and unable to sample important states for the buried water molecules.
As a consequence, properly equilibrating water molecules is essential for attaining accurate results in MD simulations, such as calculating protein-ligand binding free energies with MDbased methods. [3][4][5][6] For example, alchemical methods that decouple the entire ligand from the binding site to compute absolute binding free energies, [7][8][9][10][11][12] or which chemically modify the ligand to compute relative binding free energies, [13][14][15][16][17] may not converge without accounting for resulting changes in the water occupancy of the binding site, especially for deeply buried cavities. 18 This is particularly problematic in drug discovery applications because the system may appear to be converged using standard MD analysis techniques, leading to false positive or false negative predictions.
Pioneering approaches to speed the equilibration of water molecules between bulk solvent and the protein interior have interleaved grand canonical Monte Carlo (GCMC) water moves 19 with standard MD steps, 3,6,18,20,21 and recent work 22 has made this technology available in the OpenMM simulation toolkit. 23 A GCMC move starts with an attempt ("try") to insert/remove a water molecule to/from a random point in the simulated system. Each try is stochastically accepted or rejected with a probability based on its energy and the target value of the chemical potential of water in the system being simulated. Because the random locations at which water moves are tried and potentially accepted can be in the protein interior or in bulk solvent, such moves are able to equilibrate water between the protein interior and exteriors, as desired.
However, the GCMC approach still poses some challenges. One results from the fact that the acceptance probability of a given try depends on the desired chemical potential of water, and additional work is typically needed to establish this value for the temperature, pressure, and composition, of the system being simulated. In addition, the GCMC/MD approach generates trajectories with a time-varying number of water molecules, a feature that can complicate bookkeeping. Finally, the probability of accepting a GC water insertion or removal move is typically low in condensed phase systems, so the method can be inefficient.
We recently introduced an alternative approach to equilibrating water in simulations, where translational MC moves allow water to jump between the protein interior and the bulk, while the number of waters in the simulation remains constant, 24 and furthermore showed that this approach can improve both the precision and the accuracy of calculations of protein-ligand relative binding free energies. 5 This MC/MD approach avoids the need of a GCMC/MD method to determine what chemical potential to use for each set of simulation conditions, and it also avoids any bookkeeping and software issues that might result from the time-varying number of water molecules. Although we took several algorithmic measures to speed the MC/MD method, 24 integration of the MC steps with MD nonetheless markedly slowed the simulations in our initial implementation. We have also collaboratively explored the use of nonequilibrium candidate MC (NCMC) moves to increase the acceptance rate of the large translational MC water moves in this approach, 25 but efficiency remains a concern.
Here, we present a different approach to improving the efficiency of the translational MC method. We describe the design, implementation, and validation of a new version of the standard MC/MD algorithm within the AMBER simulation code 26,27 that utilizes a parallel MC algorithm implemented on graphical processing units (GPUs), resulting in speedups of roughly 10-to 1000-fold over the original MC/MD algorithm. The present method combines an MC method that allows multiple moves to be tried in parallel while still sampling configurations from the canonical distribution, with GPU acceleration of key bottleneck steps in the algorithm.

WORKFLOW
The present algorithm iteratively alternates a block of NMD standard MD steps and a multi-try MC step in which NMC trial moves are processed in parallel, as detailed below. This is similar to our prior algorithm, 24 where blocks of NMD steps alternated with blocks of NMC single-try MC steps. Each block of MD starts from the configuration present at the end of the prior MC step and vice versa. The velocities of all the atoms in the system at the start of each MD block are the same as those at the end of the prior MD block; that is, a moved water molecule carries its velocities. In addition, the orientation of a moved water is not modified, as the orientation of waters in the bulk is randomized by the MD blocks. Trajectory snapshots (atomic coordinates and velocities and system properties such as energy) are saved during the MD block, but not during MC. Steps within the blue frame constitute the parallel MC part (see text for details). MC preparation comprises generation of the steric grid and the grid with the neighbor list for coarse energy calculations. Full energies are computed with the particle-mesh Ewald treatment of long-ranged electrostatics exactly as done during an MD step. Coarse and full energies are computed on the GPU, as are the steric grid and the coarse energy neighbor list.

PARALLELISM THROUGH MULTI-TRY MONTE CARLO STEPS
To facilitate an efficient GPU implementation, we moved from traditional serial MC to a parallel MC algorithm (previously described in the context of waste-recycling MC) in which multiple moves are tried in each step. 28,29 The energies of all the move attempts are evaluated in parallel on the GPU, leading to dramatic wall-clock speed-up.
Each multi-try MC step starts with the system in an initial state indexed as i = 0 and with energy 0 . From this state, additional trial states i = 1…NMC are generated in which a water molecule, chosen with uniform probability from those within a region defined in the frame of reference of a residue, termed the MC region (Section 2.3), is moved to a sterically feasible location chosen with uniform probability within the same MC region. Thus, water moves can only occur within a single region, preserving detailed balance. Sterically feasible locations are identified via a steric grid within the MC region, as previously detailed. 24 For each trial move, a fast but coarse approximation to the energy change, Δ is then computed. If this energy difference is lower than a user-defined cutoff, i.e., Δ < Δ then a full PME energy calculation is done to provide energy . Otherwise, i.e., if Δ > Δ , the energy of this trial state is treated as if it were infinite.
From the energies of the initial (i = 0) and trial (i = 1,…NMC) states, we obtain the following Boltzmann factors where R is the gas constant and T is absolute temperature. Finally, one of the NMC + 1 states (initial or trial) then is selected according to the following probabilities: Note that the rationale for trying only sterically feasible locations and for spending the computer time to compute the full energy only for tries with a low approximate energy is that any very high energy state will almost certainly not be accepted as the new state of the system and so can be discarded without a costly full energy calculation.

DEFINITION OF THE MC REGION
The definition of the MC region is modified from the prior version of our code to a method that is compatible with all types of simulation boxes (e.g., rectangular prism, truncated octahedron). Here, the MC region in which waters are moved with MC steps is a rectangular prism with user-defined dimensions and centered at the center of coordinates of a user-defined set of atoms selected with the AMBER masking function. This selection is defined with the input keyword mcwatmask, and the keyword mcligshift is used to define the length of the grid along each axis, in Å. For example, a value of 10 means that the grid will extend ±10 Å from the center along each axis. Thus, one may enhance water exchange in a buried binding pocket by centering the MC region on a residue bordering the pocket and making the region large enough that it extends into the water outside the protein. If the grid is not defined, the grid dimensions will be the entire simulation box.

GPU IMPLEMENTATION
We transferred multiple components of the MC process from the CPU to the GPU. Most importantly, the values of Δ for all trial moves in the MC step are evaluated in parallel on the GPU.The full AMBER energies are evaluated on the GPU with the PME treatment of long-ranged electrostatics for each trial configuration i with Δ > Δ . As described in Section 3.2, this approach takes advantage of the multi-try MC method. In addition, two setup steps that are time consuming on the CPU are now executed on the GPU. Thus, for both the bit grid based steric grid as well as the subsequent grid-based neighbor list E coarse calculations, hundreds of trial moves can be done in parallel by utilizing the GPU. As previously detailed, 15 the coarse calculation grid covers the entire simulation box and has a coarse spacing set to half the distance cutoff used for nonbonded interactions. Each coarse grid point is assigned a list of the atoms in its corresponding voxel, and is computed by considering the interactions of a water molecule with the atoms in the same and the neighboring two coarse voxels in each direction.

MAINTAINING THE BOLTZMANN DISTRIBUTION
We wished to confirm that the algorithm maintains detailed balance and thus yields a correct Boltzmann distribution.
To do this, we tested the method on a hot water gas for which the acceptance rate for new trial water positions is high, so that any error introduced by the MC procedure would be apparent from a difference in the energy distribution relative to a plain MD simulation of the same system with a suitable thermostat. cavities, using our method on a gas-phase system that is composed of mostly cavities is highly inefficient and is therefore considerably slower compared to MD. However, we think it is the most accurate way to examine the thermodynamic character of the system. We ran eight MC/MD calculations, each with 1.25 × 10 6 cycles, for a total of 10 7 energy calculations. All MC/MD and MD runs started with the same initial configuration of the equilibrated gas but used different random number seeds.

EFFECTS OF PARAMETERS ON EFFICIENCY FOR A PROTEIN-LIGAND COMPLEX
Decreasing the value of E cut speeds the MC step by reducing the number of costly full energy calculations, but it also risks perturbing the results due to excessive reliance on the less accurate coarse energy calculation. We explored this tradeoff, as well as the consequences of varying the number of MC tries per step, NMC for a protein-ligand test system.
We used the Major Urinary Protein (MUP) structure with the PDB ID 1ZNK. 32 The system was constructed using the TIP3P water model, 33  Monte Carlo barostat. 41 The SHAKE algorithm 30,31 was used to constrain hydrogen bond lengths. The standard AMBER protocol for non-bonded interactions was followed. Thus, the particle mesh Ewald (PME) method 42 was used for periodic boundary conditions with an 8 Å cutoff for the short-ranged PME contribution as well as LJ interactions. A long-range continuum correction was used for the dispersive term. 43 Cartesian restraints with a force constant of 5 kcal mol -1 Å -2 were applied to the Cα atoms and ligand atoms during the production run.
For this MUP system, we checked the effects of varying threshold of energy, E cut , to send an MC try from a fast, coarse energy calculation to a slower, full pmemd calculation. 24 The higher the value, more MC tries are sent to full pmemd calculation rather than being immediately rejected, and therefore the runtime is slower. Note that using a higher value can lead to an increase in the acceptance rate, but not a lower acceptance rate. This is because configurations that would otherwise have been rejected get a second chance to be accepted following a full energy calculation, as detailed in Section 4.2. We also investigated the effect of NMC on the results and timings for this system. Thus, the number of MD steps per cycle was held at 10 3 , while NMC was set to the following values: 10 3 , 10 4 , 2.5×10 4 , 5×10 4 , and 10 5 .

HARDWARE USED
All simulations were performed on Nvidia GeForce GTX 1080 Ti GPUs, with an Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GH.

APPLICATION TO A BURIED CAVITY IN T4 LYSOZYME
We used our MC/MD method to analyze the equilibrium water density in the moderately polar, buried, cavity 44

THERMODYNAMIC TEST SYSTEM
Our thermodynamic test system, gas-phase water at 500K, allows us to check that we maintain the correct Boltzmann distribution for a system that has a large number of accepted MC tries; here ~57% of the MC steps resulted in a new water configuration. The potential energy distribution provided by the MC/MD method agrees closely with that provided by a pure MD simulation of the same system ( Figure 2A). In addition, the results are highly consistent across all eight MC/MD runs, as shown in Figure 2B, which shows all simulations starting from the same configuration energy, diverging as sampling proceeds, and converging to very similar mean energies. For pure MD, the mean potential energy over 10 7 energy evaluations was −126.8 kcal mol -1 , while the mean potential energy for MC/MD for 10 7 energy evaluations divided into 8 separate 1.25 × 10 6 runs was -127.0 kcal mol -1 ( Figure 2B). The good agreement between pure MD and MC/MD supports the validity of the parallel MC method and its correct maintenance of detailed balance. Hence, it can be used to compute experimental observables, such as binding free energies.

PARAMETER OPTIMIZATION AND EFFICIENCY
We examined the effects of E cut and NMC for the MUP protein-ligand test system. Setting E cut to 15, 12, and 10 kcal mol -1 resulted in 0.002, 0.0006 and 0.0002, of MC tries, respectively, being sent for full energy calculations. However, the fraction of tries accepted does not change significantly between these E cut thresholds ( Figure 3A). This indicates that using the lower threshold tried here, 10 kcal mol -1 is sufficient and that there is no need for a higher, and thus more rigorous, energy threshold, which results in a slowdown of the program, as shown in Figure 3B. In addition, as expected, increasing the number of MC tries (NMC) increases the number of acceptances, though with somewhat diminishing acceptance rates for higher values of this parameter ( Figure 3A). Presumably, there are some MC cycles where the water molecule chosen is particularly stable and/or there are few trial sites available where the water could make favorable interactions with its surrounding atoms.  The present parallel, GPU-based MC implementation runs dramatically faster than our prior serial, CPU-based MC code, which is reported as AMBER20 in Figure 3B This speed enhancement has important implications for the integration of the present technology into absolute and relative binding free energy calculations. Thus, for one of our prior relative binding free energy calculations (the W1→W2 perturbation, decharge windows), using MC/MD to equilibrate a single λ window took ~9 hours, 5 while in the current version, this only requires ~30 seconds, a 1000x speedup.

ILLUSTRATIVE APPLICATION TO A CAVITY IN T4 LYSOZYME
The present MC/MD method may be applied to study the hydration of buried cavities. Here, we demonstrate this use case for the engineered L99A/M102Q mutant of T4lysozyme cavity. 44 Simulations with MC/MD and pure MD were initiated with no water molecules in cavity ( Figure 4A). The MC/MD simulations provided a smooth, physically plausible water distribution within the cavity ( Figure 4B), while water did not penetrate to the cavity in a pure MD simulation ( Figure 4C).

DISCUSSION
The present MC/MD method achieves speedups of 10-1000x over our prior version by using a parallel MC algorithm that is effectively ported to the GPU. A further update allows it to support all simulation box types by defining the steric grid not in terms of the boundary but in terms of the distances from a molecular component of interest, such as a ligand or a residue at the edge of a binding pocket. The ability to equilibrate buried water molecules with minimal slowdown means this can be integrated with alchemical free energy calculations and performed as part of any system setup to establish preferred locations of water molecules. This is important as it solves the need for a priori knowledge regarding the hydration state of the cavity and allows for water sampling in the context of a flexible protein. It may also find application for structural optimization of water molecules in crystallographic or cryo-electron microscopy structures, and to accelerate sampling of structural processes that involve the penetration of water into the protein interior, potentially including protein conformational changes or full denaturation. It may be thought that the acceptance rate would be higher for GCMC than for the present MC method, because a trial move for GCMC does not require paying the penalty of removing water from its initial location. In fact, this should not be the case, as the acceptance probability for a move to a given location with GCMC should be the same as the mean acceptance probability to the same location from a simulation of bulk water having the same chemical potential as that used in the GCMC calculation.
A typical MC simulation of water makes small incremental water moves, with correspondingly small energy changes, and thus can have a large acceptance probability for these moves. In contrast, here the whole point is to make large translational MC moves that allow jumps between the protein cavity and the surrounding water. This leads to a rather low acceptance rate and motivates efforts to maximize efficiency. Following the marked advance in speed described here, possible future directions may include omitting moves that would be only within the surrounding water or within the cavity, as these are not very interesting and are accounted for by the MD steps, while being careful to avoid any possible violations of detailed balance. Another direction is to allow small clusters of water to move together in order to increase the rate of acceptance for moves into large nonpolar cavities. However, the present method is already highly efficient, and is currently available for use in the AMBER simulation software.

ACKNOWLEDGEMENTS
MKG acknowledges funding from National Institute of General Medical Sciences (GM061300 and GM100946). These findings are solely of the authors and do not necessarily represent the views of the NIH. MKG has an equity interest in and is a cofounder and scientific advisor of VeraChem LLC. CL, BKR, and WS are employees of Roivant Discovery.