DeepWEST: Deep learning of kinetic models with the Weighted Ensemble Simulation Toolkit for enhanced sampling

Recent advances in computational power and algorithms have made molecular dynamics (MD) simulations reach greater timescales. However, for observing conformational transitions associated with biomolecular processes, MD simulations still have limitations. Several enhanced sampling techniques seek to address this challenge, including the weighted ensemble (WE) method, which samples transition between metastable states using many weighted trajectories to estimate kinetic rate constants. However, initial sampling of the potential energy surface has a significant impact on the performance of WE, i.e., convergence and efficiency. We thereby introduce deep-learned kinetic modeling approaches that extract statistically relevant information from short MD trajectories to provide a well-sampled initial state distribution for WE simulation. This hybrid approach overcomes any statistical bias to the system as it runs short

0][21] Timesteps for MD simulations are restricted within the femtosecond range to correctly integrate the equations of motion as they cannot exceed the time period of highestfrequency thermal oscillation.Biologically relevant systems undergo complex conformational transitions, which are essential to their functions.It is challenging to capture these transitions through conventional MD, especially if they are relatively slow processes (milliseconds or longer), involve large-scale conformational rearrangements, or include protein association and/or (re)folding.6][27][28][29] One category of enhanced sampling methods adds a bias potential to the potential energy surface (PES) that decreases the energy barrier of transition between metastable states and accelerates the conformational search.We could roughly categorize these methods as collective variable (CV)-based and CV-free enhanced sampling methods.CV-based enhanced sampling approaches include and are not limited to metadynamics (metaD), 30,31 variationally enhanced sampling, 32 and Markov state models (MSMs) 33,34 while CV-free enhanced sampling methods include parallel tempering or replica exchange molecular dynamics (REMD), 35,36 selective integrated tempering, and Gaussian accelerated molecular dynamics (GaMD). 37All the previously mentioned enhanced sampling approaches are capable of accelerated PES conformational search.Thereby, they prove to be effective in extensive thermodynamic sampling.On the contrary, such enhanced sampling approaches add a bias potential to the system's potential energy and thereby lead to altered dynamics.
[40][41][42] On the other hand, several path sampling methods exist for an extensive sampling of kinetic properties, which are broadly divided into complete path sampling and segment-based sampling.Transition path sampling (TPS) and dynamic importance sampling (DIMS) are based on complete reactant to product path sampling.Segment-based sampling approaches are based on splitting the reactant to product path into several segments or "bins" where bin-to-bin transitions are sampled by running independent MD simulations of fixed duration within these bins.These methods include and are not limited to weighted ensemble (WE) methodology, 43 milestoning approaches, [44][45][46] adaptive multilevel splitting (AMS), [47][48][49] and transition interface sampling (TIS). 50These path sampling methods typically focus on sampling the transition regions, and thereby, an entire PES of the system is often neglected.The WE method enhances the sampling of rare events by running independent and unbiased parallel simulations in short configurational spaces or "bins".These simulations communicate through each other through replication and resampling, leading to the precise estimation of kinetic observables.The WE method is further explained in the section 2.2.
Our recent work demonstrated the effectiveness of a hybrid enhanced sampling method, the GaMD-WE method, which combines GaMD and WE for an extensive sampling of both the thermodynamics and kinetics of biological systems of interest. 51GaMD is performed initially to sample the PES of the system by applying several boost potentials and is followed by reweighing to recover the original PES.Configurations selected from the recovered PES are the starting structures for the WE simulations.Several independent boost potentials are applied in parallel GaMD simulations and are reweighted accordingly to recover the PES of the system.The GaMD run with the most PES coverage provides the starting structures for WE simulations.We aim to further decrease the computational cost of thermodynamic and kinetic sampling of systems of interest by running unbiased MD simulations and implementing deep learning with the Markovian variational approach for faster and enhanced hybrid sampling. 52,53This way, we can bypass the entire reweighting process to recover the original PES.
Providing a well-sampled initial state distribution to WE simulations is often a concern.
Generally, multiple starting structures are provided at the beginning of a simulation, and they often fail to establish extensive communication with each other, thereby leading to an initial bias in the simulation.Markov State Models (MSMs) have shown to be successful in generating equilibrium distribution and thereby removing the initial bias in the simulations. 54e history-augmented Markov state model (haMSM) is another efficient method for the removal of bias in stationary distributions, and it has been implemented recently in WE simulations, where a steady state analysis is performed, and the weights are distributed accordingly to restart the simulation. 55In short, DeepWEST is an attempt to construct MSMs from short MD simulations, providing a well-sampled initial distribution for the WE simulations.We propose a hybrid method that uses the variational approach for Markov principle (VAMP) and neural networks to identify metastable states from unbiased and short MD simulations as a precursor for running WE simulations.Selected conformations from these states then serve as starting structures for WE simulations to further sample the kinetics and thermodynamics of the system.This hybrid methodology further reduces the computational cost as compared to our previously developed hybrid GaMD-WE method, accelerates the WE approach further by providing well-sampled initial configurations, and provides a better comprehensive picture of the thermodynamic and kinetic properties of the systems of interest by introducing no statistical bias to the free energy landscape to the system.In the forthcoming sections, we will describe the WE method, the VAMP approach, and the hybrid method in detail.We will also demonstrate the capability of the DeepWEST approach to sample kinetic and thermodynamic properties faster than the WE method alone and compare the findings with the already established GaMD-WE approach.

Variational Approaches for Markovian Processes
Markovian processes are stochastic processes where the future state of the system, x t+τ depends only on the current state, x t .Here, t is the time step, and τ is the lag time.Various Markov modeling approaches have been developed recently to extract key information for complex dynamical processes.These methods include and are not limited to Markov state models (MSMs), [56][57][58] Markov transition models, variation approach to conformational dynamics (VAC), 59 time-lagged independent component analysis (TICA), 60 variational diffusion maps, 61,62 and variational approach for Markov processes (VAMP). 52,63Dynamical systems often display high non-linearity in their system coordinates.To analyze non-linear and high-dimensional dynamical systems, we employ the Koopman operator, K, that linearly transforms the vector space spanned by observables in the form of time-series data generated by MD simulations.5][66] A majority of the Markovian modeling approaches exploit the fact that there exists a non-linear transformation of these features such that the dynamics can be approximated as a linear Markov model. 67,68Let χ 0 and χ 1 be the feature transformations for the trajectory coordinates x t and x t+τ respectively, and E be the expectation value such that the matrix K determines the dynamics of the system according to equation 1.According to the VAMP theory, when the subspaces spanned by the features, χ 0 and χ 1 , are identical to the top left and top right singular functions of K, we obtain the best finite-dimensional linear model.
To generate well-sampled initial conformations for WE simulations, we resort to a neural network architecture that employs the variational approach for Markov processes (VAMP), known as VAMPnets, which is a non-biased trajectory learning approach towards faster estimation of kinetics and thermodynamics of systems of interest as compared to other hybrid approaches. 691][72][73] Featurization is the process where the simulation data is often subjected to the removal of translational and rotational motion and/or transformed into internal coordinates.5][76] However, initial steps of featurization, such as choosing appropriate metrics for training the trajectory dataset such as using the cartesian coordinates or internal coordinates, selecting suitable CVs that describe the conformation space such as dihedral angles RMSD, the radius of gyration, removal of solvent coordinates, selection of heavy atoms, and realignment are to be determined before training the trajectory using VAMPnets.The resultant metric space is then discretized to fewer states, and the process is called discretization. 71,77,78Finally, a coarse-graining of the Markov state model (MSM) is performed since the internal dynamics of a particular set of microstates can be faster than the modeling timescales. 79,80All of the above steps involved in the process of generating MSMs are performed by VAMPNets that learn the non-linear collective variables or reaction coordinates that separate the metastable states of biological systems.
A simple VAMPnets model comprises two parallel neural network architectures that are employed to learn feature transformations using the VAMP approach.Each network receives the initial MD trajectory coordinates and time-lagged correspondent of the same coordinates, x t and x t+τ respectively.Non-linear dimensionality reduction is then performed on these two sets of trajectory inputs for each time step, t.As per the VAMP principle, a VAMP-2 score is defined that attains its maximum value when the top left and right components of the Koopman operator K are equivalent to the subspaces spanned by these features. 52Deployed neural networks are trained to maximize the VAMP-2 score in order to achieve the best finitedimensional linear model.In the above process, it achieves the segregation of the trajectory frames and assigns them to particular clusters or Markov states, which then accelerates the process of rare-event sampling and transition state analysis.Key features for VAMPnets include the choice of the lag time, the number of output nodes, and the network depth of the architecture.Few pre-optimization runs with varying lag times were performed on the simulation trajectory to find the lag time that could resolve the slowest processes.The choice of a lag time relies upon the eigenvalue decomposition of the Markov propagators.When selecting a large lag time that exceeds the timescale of the slowest processes, it becomes difficult to fit the noisy data.However, a short lag time leads to them getting stuck in one of the suboptimal maxima of the training score.The selection of output nodes represents the separable metastable regions from the trajectory data.A higher number of output nodes may not be suitable for small trajectory data since the clustering will lead to discretizing the transition regions.The clustering of metastable states also depends heavily on the network depth of the architecture.A deeper network describes complex functions and is more difficult to train.

Weighted ensemble method
Standard unbiased MD simulations are limited by the timescale and infrequency of significant biological events.Such events are rarely captured in the trajectory data, and it becomes challenging to analyze further and estimate the kinetics of such transitions.The weighted ensemble simulation approach is an enhanced sampling method designed to sample such rare occurrences or transitions such as protein-protein association reactions and conformational changes by replicating and resampling an ensemble of weighted trajectories. 43,81The whole configurational space is subdivided into macrostates or "bins" that sequentially lead the system from an initial state to the target state.Many short simulations or "walkers" carrying probabilities or "weights" are run and the system evolves throughout the simulation through "resampling" that ensures an equal number of trajectories in each bin, i.e., by splitting or merging walkers.These walkers explore the configurational space extensively, eventually sampling rare transitions.Appropriate progress coordinates are used to define the bin boundaries.Transitions between these bins are recorded while the system evolves in time to estimate the kinetics and thermodynamics.The WE method has been demonstrated to accurately estimate the kinetics of rare events for multiple biological systems. 82,83sampling is a crucial component of the WE approach that leads to the statistically unbiased future evolution of the system. 82,84,85Walkers are resampled after every iteration via appropriate replication and reassignment of weights.Weighted trajectories are initiated from assigned bins and are propagated for a short interval, τ .These trajectories are either replicated where there are too few trajectories or deleted where there are more than the required number of trajectories per bin.Since there is no statistical bias in the simulation, thermodynamic and kinetic properties can be directly obtained by the evolution of the weights of the walkers in each bin.Simulation results can be periodically checked for convergence by estimating rate constants for various possible transitions between the macrostates.
This approach also eliminates the need for choosing the resampling time, τ based on the Markovian property and thereby provides the flexibility to select τ based on the system size.

DeepWEST
The deep learning of kinetic models with the Weighted Ensemble Simulation Toolkit (Deep-WEST) method aims to provide well-sampled initial distribution to WE simulations.It is achieved by running a relatively short MD simulation and processing the trajectory data through the VAMP approach by employing neural networks.Initial conformations for WE simulations and their probabilistic values are extracted from the resultant MSMs.WE simulations are then run with this distribution, generating a more refined free energy landscape closer to the steady-state distribution.We have developed a DeepWEST package that au-tomates the entire process of running MD simulations for trajectory analysis, deep learning of kinetic models for generating MSMs through VAMPNets, and well-sampled initial conformations for running WE simulations.The entire workflow can be described below: 1. Systems of interest are downloaded from the Protein Data Bank (PDB) server and are prepared for MD simulations using appropriate force field parameters, periodic box vectors, and solvation using the Amber 14 package. 87 Once the system is prepared, it is followed by energy minimization, simulated heating, and equilibration using the OpenMM simulation engine.88 3. MD simulation is then performed using the Amber simulation engine for the desired amount of simulation time.Trajectories are saved with the desired frequency.
4. The trajectory data is subjected to featurization, dimensionality reduction, discretization, and kinetic modeling using VAMPnets to generate metastable states.
5. VAMPnets deploy a "fuzzy clustering" of MD trajectories in "n" output states.Each conformation in the MD trajectory has a probability for each of these output states that sum to one.The cluster with the maximum likelihood for each conformation is the one that is assigned to the conformation.Post clustering, conformations within each cluster are further binned based on one of the CVs (dihedral angle, ϕ, in the case of alanine dipeptide and RMSD in the case of chignolin).From each subcluster, a certain number of conformations are selected from each bin to generate a well-sampled initial state sampling for WE simulations.Once the conformations are chosen from these clusters, they are assigned the weights based on their fraction of the population of the cluster to which they belong.
6. WE simulation folder is created with all the required starting structures, weights, and topology files to run a WE simulation.To avoid any steric clashes during the initial of heavy atoms (N, C, O, and C α ) followed by minimization of the entire system.
7. WE simulations are then run with the starting structures with desired probabilities assigned for each conformation.
The Results section shows that our method estimates the kinetics of the systems of interest, i.e., alanine dipeptide and chignolin, more quickly and more accurately than the WE method alone.Long-scale unbiased MD simulations or brute force methods are often used as a reference for evaluating the performances of new methods on model systems. 89Five independent brute force simulations each of 2 µs for alanine dipeptide and 4 µs for chignolin were performed to obtain the reference values of rate constants over aggregate simulation time.DeepWEST performs equally well and often surpasses the brute force and the hybrid GAMD-WE approach with the advantage of being computationally inexpensive and easy to implement while adding no statistical bias to the system.

Results
To demonstrate the effectiveness of the current approach, we have tested the DeepWEST method on three systems, namely alanine dipeptide with explicit solvation (Figure 1a), chignolin with implicit solvation (Figure 1b), and the NTL9 protein with implicit solvation (Fig- ure 1c).Kinetics and thermodynamics obtained from these systems using our DeepWEST approach are compared against the WE and our previously developed hybrid GaMD-WE approach.For the systems mentioned above, we demonstrate that the DeepWEST approach surpasses the WE approach in estimating the rate constants between the metastable states of interest and even outperforms the already established hybrid GaMD-WE approach in many occurrences.We also demonstrate that DeepWEST samples the free energy landscape equivalently compared to the already established approaches.Error bars representing 95% confidence intervals are calculated for all WE, GaMD-WE, and DeepWEST runs.Simula-tions are run until there is no significant change in the average rate constant or "convergence" is reached.Rate constants are plotted with an approximate interval of 50 iterations for each run.
Figure 1: Model systems tested by WE, GaMD-WE, and DeepWEST approaches

Alanine dipeptide
[92] It is a 22-atom system with an acetyl group at the N-terminus and N-methyl amide at the C-terminus (Figure 1a).Initial coordinates for alanine dipeptide were obtained from http://ftp.imp.fu-berlin.de/pub/cmb-data/alanine-dipeptide-nowater.pdb.AM-BER ff14SB forcefield was used to prepare the system for MD simulations. 93It was then subjected to explicit solvation using the TIP3P water model. 94Alanine dipeptide was sub-jected to 50 ns of unbiased MD simulations.Trajectories were trained using VAMP neural networks with a time lag of 80 ps, training ratio of 0.9, and a batch size of 1000 that learned discretization in three metastable states.A list of hyperparameters used for network training is provided in Table 1.Starting structures were selected from the metastable states as per the DeepWEST protocol, followed by three independent WE simulation of 12 µs each with a total simulation time of 36 µs.Resampling time, τ was kept to be identical as used in WE and GaMD-WE runs, i.e., 10 ps. 51,82CVs were set to be the dihedral angles, ϕ and ψ, which were evenly spaced in bins of 0.17 rad with ϕ and ψ ranging between [-3.14 rad, 3.14 rad ].
The target number of walkers per bin, n w was set to be 4 for the WE simulation.To compare the performance of the DeepWEST method with the WE and the hybrid GAMD-WE method, we have identified three metastable states in alanine dipeptide as shown in Figure 2(a).Note that Figure 2(b) shows the three metasbale states when the simulation time was 250 ns.For the DeepWEST method, we have used the 50 ns simulation time as an input trajectory to be trained by the network architecture.The metastable states were chosen in such a way that each state represents an important region of the PES.These regions are α R , α L , and P II .α R lies within the α region while P II lies in the β region of the   Two different WE approaches, namely the conventional WE and the hybrid GaMD-WE approach, were compared to assess the performance of our newly developed DeepWEST Table 2: Rate constants (ns −1 ) between different metastable states of alanine dipeptide obtained after 12 µs of aggregate simulation time.In the case of brute force simulation, the first value indicates the average rate constant while the value within the brackets indicates the 95% confidence interval computed using Bayesian bootstrapping.For WE, GaMD-WE, and the DeepWEST methods, the error bars represent 95% confidence intervals obtained from the standard deviation of three independent runs.6: Evolution of rate constants over aggregate simulation time for brute force simulation (black), WE simulations (red), GAMD-WE approach (blue), and DeepWEST approach (green).For WE, GaMD-WE, and the DeepWEST methods, the error bars represent 95% confidence intervals obtained from the standard deviation of three independent runs.approach.WE was run for a total simulation time of 36 µs averaged over three independent runs of 12 µs each.Similarly, the GaMD-WE approach was run for 50 ns of GaMD followed by 11.95 µs of WE, totaling a simulation time of 36 µs averaged over three independent runs of of 12 µs each.Likewise, the DeepWEST approach was run for an unbiased MD simulation of 50 ns followed by 11.95 µs of WE simulation, totaling a simulation time of 36 µs averaged over three independent runs of 12 µs each.Figure 2 demonstrates the ability of the deep-learned kinetic modeling approach to cluster the MSMs and also ensures that adequate sampling has been achieved to select starting conformations.The average free energy is given by -k B T lnP where k B is the Boltzmann constant, T is the temperature, and P is the probability.Figure 4a and Figure 4b represent the average free energy profiles of alanine dipeptide for the WE and the hybrid GaMD-WE approach, respectively, while Figure 4c represents the free energy profile for the DeepWEST approach.Figure 4c shows almost identical PES coverages obtained through the DeepWEST approach as compared to the WE (Figure 4a) and the hybrid GaMD-WE approach (Figure 4b).This, in turn, demonstrates the ability of the DeepWEST approach in accessing the entire free energy landscape of the system with enough sampling in the P II , α L , and the right-handed α-helix or α R region.Figure 5 demonstrates the average free energy profiles of alanine dipeptide from three separate runs of WE, GaMD-WE, and DeepWEST simulations at different stages the simulation.
To explore the performance of the DeepWEST approach in estimating kinetic rates, brute force simulations are carried out to obtain reference values.Three independent 12 µs simulations were run, and Bayesian bootstrapping was performed for 95 % confidence intervals.
Rate constants were obtained from different methods, i.e., the brute force calculations, WE, GaMD-WE, and the DeepWEST method for transitions between metastable regions of interest (Table 2).For the transitions between the central metastable region, α R to P II , the DeepWEST method outperformed both the WE and the GaMD-WE methods in estimating the rate constant.The rate constant for the DeepWEST method converged faster compared to the brute force value (Figure 6c).However, as observed in Figure 6d, all the methods slightly overestimated the rate constant for the transition from P II to α R .In the transitions from α L to P II (Figure 6a), P II to α L (Figure 6b), α R to α L (Figure 6e), and α L to α R (Figure 6f), comparable convergence and accuracy of the kinetic rates were observed for the DeepWEST approach as compared to the WE and the GaMD-WE methods.

Chignolin
Chignolin (PDB ID: 1UAO), a designed protein, is a model system consisting of 10 amino acid residues (GYDPETGTWG) (Figure 1b). 95It forms a stable β-hairpin structure in implicit solvent, and MD simulations reveal the slow unfolding of chignolin. 96Initial coordinates for chignolin were obtained from https://files.rcsb.org/download/1UAO.pdb1.gz.8][99] AMBER ff14SB force field was used to prepare the system for MD simulations. 93ignolin was subjected to 100 ns of constant volume unbiased MD simulation with a collision frequency of 1 ps -1 , employing a Langevin thermostat at a constant temperature of 300 K. To interpret the kinetics of a system that demonstrates folding and unfolding behavior, we encounter the problem of trajectory alignment with respect to a unique reference structure.Moreover, a large amount of noise would be introduced while the networks transform the data via rotations and translations.Hence for the chignolin system, internal coordinates were chosen as a network input.Nearest-neighbor heavy-atom distances between all nonredundant residues separated by two or more residues served as the network input resulting in 28 nodes.MD trajectories were trained using VAMP neural networks with a time lag of 40 ps, training ratio of 0.9, and a batch size of 1000 that learned discretization in three metastable states by performing a hierarchical decomposition of the state space.A list of hyperparameters used for network training is provided in Table 3. Starting structures were selected from the metastable states as per the DeepWEST protocol, followed by three independent WE simulations of 40 µs each with a total simulation time of 120 µs.Resampling time, τ was kept to be identical as used in WE and GaMD-WE runs, i.e., 20 ps. 51CVs for WE simulations were set to be mass-weighted root-mean-square deviation (RMSD) and mass-weighted radius of gyration (R g ) respectively.Both CVs were evenly spaced in bins of 0.02 nm ranging between [0 nm, 1 nm].The target number of walkers per bin, n w was set to 4 for the WE simulation.To compare the performance of the DeepWEST method with the hybrid GaMD-WE and WE methods, we have identified three metastable states in chignolin as shown in Figure 7.
These regions are folded, unfolded, Intermediate I (I 1 ), and Intermediate II (I 2 ).The folded region is defined as RMSD ≤ 0.20 nm, while the unfolded region is defined as RMSD ≥ 0.55 nm.I 1 is defined to be 0.20 nm ≤ RMSD ≤ 0.30 nm and 0.425 nm ≤ R g ≤ 0.525 nm while I 2 is defined to be 0.60 nm ≤ RMSD ≤ 0.70 nm and 0.70 nm ≤ R g ≤ 0.80 nm.
Chignolin represents a common protein structural motif that undergoes folding and unfolding during brute force MD simulations.We tested the DeepWEST method by estimating rate constants between regions of interest, especially in the folded and unfolded regions.WE was run for a total simulation time of 120 µs averaged over three independent runs of 40 µs each.Similarly, the GaMD-WE approach ran six independent 500 ns of GaMD with varying boost potentials, and starting conformations were selected from the GaMD run that had the largest PES coverage.It was then followed by 39.50 µs of WE simulations, totaling a simulation time of 120 µs averaged over three independent runs of 40 µs each.Likewise, the Table 4: Rate constants (ns −1 ) between different metastable states for chignolin obtained after 40 µs of aggregate simulation time.In the case of brute force simulation, the first value indicates the average rate constant while the value within the brackets indicates the 95% confidence interval computed using Bayesian bootstrapping.For WE, GaMD-WE, and the DeepWEST methods, the error bars represent 95% confidence intervals obtained from the standard deviation of three independent runs.DeepWEST approach ran an unbiased MD simulation of 100 ns followed by 39.90 µs of WE simulation, totaling a simulation time of 120 µs averaged over three independent runs of 40 µs each.Figure 8 demonstrates the ability of the deep-learned kinetic modeling approach to cluster the MSMs for chignolin.Initial conformations were selected from these metastable states for WE simulations.Figure 9a and Figure 9b represent the average free energy profiles of chignolin from the WE and the hybrid GaMD-WE approach, respectively, while Figure 9c represents the free energy profile from the DeepWEST approach.Figure 10 demonstrates the average free energy profiles of chignolin from three separate runs of WE, GaMD-WE, and DeepWEST simulations at different stages the simulation.The PES coverages for chignolin from the three approaches are comparable and demonstrates that the DeepWEST method can sample the thermodynamics of the system accurately.
To assess the performance of the DeepWEST approach with respect to WE and the hybrid GaMD-WE approach, we estimated the convergence of rate constants between metastable states, especially defined within folded and unfolded regions of chignolin (Table 4).The  Figure 11: Evolution of rate constants over aggregate simulation time for brute force simulation (black), WE simulations (red), GAMD-WE approach (blue), and DeepWEST approach (green).For WE, GaMD-WE, and the DeepWEST methods, the error bars represent 95% confidence intervals obtained from the standard deviation of three independent runs.two intermediate regions, I 1 and I 2 are separated apart in the PES, and convergence of rate constants between these regions is difficult to achieve. Figure 11c demonstrates faster and more accurate convergence of the DeepWEST approach as compared to the WE and GaMD-WE approach for the transition from I 1 to I 2 .However, the GaMD-WE method outperformed the DeepWEST method in estimating the rate constant for the transition from I 2 to I 1 (Figure 11d).The DeepWEST method showed faster convergence and greater accuracy in estimating the rate constants between the I 2 and the folded states (Figure 11e and 11f).Faster convergence for the DeepWEST method was also observed from the folded to the unfolded state (Figure 11a).However, the GaMD-WE and the DeepWEST method equally outperformed the WE method in estimating the rate constant in the transition from the unfolded to the folded state (Figure 11a).In conclusion, for all the cases, both the hybrid methods, especially the DeepWEST method, outperformed the WE method in achieving faster convergence and accuracy of rate constants between the metastable regions of interest.

NTL9
To further test the capabilities of the DeepWEST method for a higher dimensional problem, we chose the N-terminal domain of ribosomal protein L9 (NTL9) system.The NTL9 protein is a 627-atom system consisting of 39 amino acid residues (Figure 1c) which has a folding time of ≈ 1.5 ms. 100 Initial coordinates for the unfolded NTL9 protein was obtained from https://github.com/westpa/westpa2_tutorials/blob/main/tutorial-3.
3/istates/ntl9.pdb.8][99] AMBER ff19SB force field was used to prepare the system for MD simulations. 101The NTL9 protein was then subjected to 10 µs of constant volume unbiased MD simulation with a collision frequency of 1 ps -1 , employing a Langevin thermostat at a constant temperature of 300 K. Following the folding and unfolding behavior of the NTL9 system, internal coordinates were chosen as a network input.Nearest-neighbor Figure 12: Three metastable states of the NTL9 system as an output of VAMPNets.The upper panel shows the mean contact map for each metastable state while the lower panel shows the 3D representation for each of the states.
heavy-atom distances between all non-redundant residues separated by two or more residues served as the network input resulting in 666 nodes.MD trajectories were trained using VAMP neural networks with a time lag of 60 ns, training ratio of 0.9, and a batch size of 1000 that learned discretization in three metastable states by performing a hierarchical decomposition of the state space (Figure 12).Starting structures were selected from the metastable states as per the DeepWEST protocol followed by the WE simulation of 90 µs with a resampling time, τ of 40 ps.Therefore, the total simulation time for the DeepWEST method was 120 µs.CVs for WE simulations were set to be mass-weighted root-mean-square deviation (RMSD) and mass-weighted radius of gyration (R g ) respectively.Both CVs were evenly spaced in bins of 0.05 nm ranging between [0 nm, 2 nm].The target number of walkers per bin, n w was set to 8 for the WE simulation.Similarly, the conventional WE simulation was run for 130 µs with the same parameters (i.e.resampling time, CVs and binning parameters) as described previously for the DeepWEST simulation.The WE sim-ulations initiated from the unfolded state to the folded state with the CVs set to be the mass-weighted RMSD and mass-weighted R g respectively.Since the reference structure for computing RMSD is the initial unfolded state, RMSD is expected to increase during the simulation as the system folds.The unfolded region is defined as RMSD ≤ 0.60 nm, while the unfolded region is defined as RMSD ≥ 1.0 nm.For the same amount of simulation time, i.e., 130 µs, the DeepWEST method proved more effective than the conventional WE in covering the free energy landscape of a more complex protein, i.e., the NTL9 system.Figure 13a and Figure 13b represent the free energy landscapes for the NTL9 system for the DeepWEST and the conventional WE method, respectively, where the lowest energy state was set to zero.It is evident from Figure 13a that the DeepWEST approach is more effective than the conventional WE in exploring other metastable states in the NTL9 system, particularly in the unfolded region.The rate constants between the unfolded and the folded states were not measured, but we expect the DeepWEST method to outperform the conventional WE method in kinetic rate measurements.

Discussion
The WE method predominantly depends on appropriate CVs to sample the system.In most cases, the best choice of CVs is unknown.3][104][105] Initial sampling of the PES has a significant impact on the accuracy and convergence of WE.Our previously developed hybrid GaMD-WE method addresses the concern of substantial initial sampling but reweighing is an additional concern associated with it.To obtain the correct PES of the system, GaMD resorts to reweighing, and a substantial amount of energetic noise is introduced with an increase in the system size.In that case, GaMD needs to be run for a longer timescale for varying degrees of boost potentials, increasing the overall computational cost drastically.As demonstrated through previous examples, the newly developed DeepWEST approach is a powerful method that obtains both the kinetics and thermodynamics of the system.Compared to our earlier developed hybrid approach, it is a hassle-free method for enhanced sampling prior to running WE simulations.First, DeepWEST eliminates the process of running various accelerated MD simulations with varying degrees of boost potentials.
Second, it also eliminates the process of PES reweighing to uncover the original energy landscape by running unbiased MD simulations.Most importantly, deep neural architectures learn the clustering of metastable states from short MD trajectories which could be further processed to extract starting conformations for WE simulations.Both methods involved in the DeepWEST approach add no statistical bias to the system, and thereby, kinetics and thermodynamics obtained from this approach are exact.However, a longer MD simulation time may be required to discretize metastable states as the system size increases to provide statistically relevant starting conformations for WE simulations.Recent improvements in the WE method have been made to achieve faster kinetics. 55,104An even more accurate and faster kinetics and thermodynamic sampling could be achieved if DeepWEST is combined with these improvements in WE.
Currently, we have shown the DeepWEST method to work on simpler systems, i.e., alanine dipeptide in explicit solvation and chignolin in implicit solvation.We have also shown a greater free energy landscape coverage for the NTL9 protein for the DeepWEST method against the conventional WE method.However, extending this method to more complex biological systems comes with challenges.Deep neural networks learn feature transformations and cluster trajectories based on input CVs.The network architecture for such systems primarily depends on the choice of time lag and the number of metastable states we want to cluster.Identifying the slowest processes in complex systems comes with the choice of accurate CVs, which in most cases is unknown.However, recent developments in enhanced MD sampling and MSM approaches would be worth considering.Implementing multi-ensemble Markov models (MEMMs) that conduct large ensembles of even shorter simulations and further accelerate trajectories by adding boosted potentials through various accelerated enhanced sampling methods in DeepWEST could aid the clustering of complex proteins. 58,106plementation of hydrogen-mass repartitioning (HMR) for solvated systems to accelerate the dynamics would prove indispensable in faster learning of metastable states. 27

Conclusion
A hybrid approach for a faster learning of kinetics combining the deep learning of MD trajectory with the WE simulations is presented here.It employs a deep learning architecture to sample statistically relevant conformations representative of the rare events from short MD simulation trajectories.The method is tested on three different model systems, namely, alanine dipeptide with explicit solvation, chignolin with implicit solvation, and the NTL9 protein with implicit solvation.Our method significantly outperforms the WE and the GaMD-WE approach for the chignolin model system, and is comparable in performance for the alanine dipeptide system.Because of its reduced simulation time and slighter sophistication as compared to the GaMD-WE approach, DeepWEST allows for fast estimation of kinetics with enhanced coverage.We have also developed an end-to-end package, DeepWEST, that automates the entire process of running MD trajectories, extracting statistically relevant conformations using VAMP theory and neural networks, and preparing the WE simulation.
The DeepWEST package is available at https://github.com/anandojha/DeepWEST.In conclusion, we have demonstrated a proof-of-concept of a hybrid data-driven approach that can be incorporated in the WE approach to obtain improved results with significantly lesser time and complexities.

Figure 2 :
Figure 2: Three metastable states of alanine dipeptide as an output of VAMPNets (a) separated by α (green) and β (red and blue) regions of the Ramachandran plot and (b) separated by negative (green and red) and positive (blue) values of ϕ and further separated by α (green) and β (red) regions of the Ramachandran plot.

Figure 3 :
Figure 3: Average free energy profile of alanine dipeptide from three separate 12 µs runs DeepWEST simulations.Rate constants to be calculated between the following regions of interest in alanine dipeptide: (a) α R ⇔ α L (b) P II ⇔ α L and (c) α R ⇔ P II

Figure 4 :
Figure 4: Average free energy profiles of alanine dipeptide from three separate 12 µs runs of WE, GaMD-WE, and DeepWEST simulations, respectively.

Figure 5 :
Figure 5: Average free energy profiles of alanine dipeptide from three separate runs of WE, GaMD-WE, and DeepWEST simulations at different stages the simulation.

Figure 7 :
Figure 7: Average free energy profile of chignolin from three separate 40 µs runs of Deep-WEST simulations.Rate constants to be calculated between the following regions of interest in chignolin: (a) Folded ⇔ Unfolded (b) I 1 ⇔ I 2 and (c) Folded ⇔ I 2

Figure 8 :
Figure 8: Three metastable states of chignolin as an output of VAMPNets.The upper panel shows the mean contact map for each metastable state while the lower panel shows the 3D representation for each of the states.

Figure 9 :
Figure 9: Average free energy profiles of chignolin from three separate 40 µs runs of WE, GaMD-WE, and DeepWEST simulations, respectively.

Figure 10 :
Figure 10: Average free energy profiles of chignolin from three separate runs of WE, GaMD-WE, and DeepWEST simulations at different stages the simulation.

Figure 13 :
Figure 13: Average free energy profiles of NTL9 from 100 µs runs of DeepWEST and WE simulations, respectively.
The efficiency of WE simulations is attributed to the weights of the starting conformations, and long-scale simulations of multiple short trajectories are required to obtain accurate kinetic and thermodynamic properties of systems of interest.Choice of initial starting conformations significantly affects the accuracy and simulation time, and it can be remarkably enhanced by providing an intelligently sampled set of initial starting conformations.The motivation for DeepWEST originates from this notion and aims at obtaining these conformations through deep learning approaches using VAMP.The Weighted Ensemble Simulation Toolkit with Parallelization and Analysis (WESTPA) is an open-source package to perform the WE simulations.
86This toolkit is highly interoperable, compatible with a wide range of MD engines, and provides an integrated protocol for efficient storage and analysis of the estimates generated.Our current work incorporates the WESTPA package for running WE simulations.Since we are focused on obtaining the rate constants between multiple states in these systems, we run equilibrium WE simulations to sample their kinetic and thermodynamic properties.However, steady-state WE simulations can be employed for systems where rate constants are calculated between two states.Such simulations employ "target state recycling", i.e., the walkers are fed into the initial state once they reach the target state.Steady-state WE simulations with starting states from equilibrium WE are more commonly used for sampling kinetic and thermodynamic properties.

Table 1 :
Hyperparameters used in VAMPNets for alanine dipeptide

Table 3 :
Hyperparameters used in VAMPNets for chignolin