Long timescale ensemble methods in molecular dynamics: Ligand-protein interactions and allostery in SARS-CoV-2 targets

27 March 2023, Version 2
This content is a preprint and has not undergone peer review at the time of posting.


We subject a series of five protein-ligand systems which contain important SARS-CoV-2 targets - 3-chymotrypsin-like protease, papain-like protease and adenosine ribose phosphatase - to long- timescale and adaptive sampling molecular dynamics simulations. By performing ensembles of ten or twelve 10-microsecond simulations for each system, we accurately and reproducibly determine ligand binding sites, both crystallographically resolved and otherwise, thereby discovering binding sites that can be exploited for drug discovery. We also report robust, ensemble-based observation of conformational changes that occur at the main binding site of 3CLPro due to the presence of another ligand at an allosteric binding site explaining the underlying cascade of events responsible for its inhibitory effect. Using our simulations, we have discovered a novel allosteric mechanism of inhibition for a ligand known to bind only at the substrate binding site. Due to the chaotic nature of molecular dynamics trajectories, individual trajectories do not allow for accurate or reproducible elucidation of macroscopic expectation values. Unprecedented at this timescale, we compare the statistical distribution of protein-ligand contact frequencies for these ten/twelve 10-microsecond trajectories and find that over 90% of trajectories have significantly different contact frequency distributions. Furthermore, using a direct binding free energy calculation protocol, we determine the ligand binding free energies for each of the identified sites using long-timescale simulations. The free energies differ by 0.77 to 7.26 kcal/mol across individual trajectories depending on the binding site and the system. We show that although this is the standard way such quantities are currently reported at long-timescale, individual simulation does not yield reliable free energies. Ensembles of independent trajectories are necessary to overcome the aleatoric uncertainty in order to obtain statistically meaningful and reproducible results. Our findings here are generally applicable to all molecular dynamics based applications and not just confined to free energy methods used in this study. Finally, we compare the application of different free energy methods to these systems and discuss their advantages and disadvantages.


Binding Affinity
Molecular Dynamics
Protein-ligand binding
Uncertainty Quantification
Aleatoric uncertainty
Long timescale
Long simulations
microsecond timescale
Ensemble simulations

Supplementary materials

Supporting Information
Figures displaying contact frequency distributions, KS statistics, p-boxes and cumulative density func- tions as well as comparisons of contact frequency distributions from long simulations and splitting pro- tocols have been included in the Supporting Information for all systems that were not accommodated in the main text. All input structure and parameter files are available on a public github repository at https://github.com/UCL-CCS/LongTimescaleStudy.


Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.