Abstract
The accurate and reliable prediction of protein-ligand binding affinities can play a central role in the drug discovery process as well as in personalised medicine. Of considerable importance during lead optimisation are the alchemical free energy methods that furnish estimation of relative binding free energies (RBFE) of similar molecules. Recent advances in these methods have increased their speed, accuracy and precision. This is evident from the increasing number of retrospective as well as prospective studies employing them. However, such methods still have limited applicability in real-world scenarios due to a number of important yet unresolved issues. Here, we report the findings from a large dataset comprising over 500 ligand transformations spanning over 300 ligands binding to a diverse set of 14 different protein targets which furnish statistically robust results on the accuracy, precision and reproducibility of RBFE calculations. We use ensemble-based methods which are the only way to provide reliable uncertainty quantification given that the underlying molecular dynamics is chaotic. These are implemented using TIES (Thermodynamic Integration with Enhanced Sampling) but are equally applicable to free energy perturbation calculations for which we expect essentially very similar results. Results achieve chemical accuracy in all cases. Ensemble simulations also furnish information on the statistical distributions of the free energy calculations which exhibit non-normal behaviour. We find that the “enhanced sampling” method known as replica exchange with solute tempering degrades RBFE predictions. We also report definitively on numerous associated alchemical factors including the choice of ligand charge method, flexibility in ligand structure and the size of the alchemical region including the number of atoms involved in transforming one ligand into another. Our findings provide a key set of recommendations that should be adopted for the reliable application of RBFE methods.