Tuning Potential Functions to Host–Guest Binding Data

14 December 2023, Version 2
This content is a preprint and has not undergone peer review at the time of posting.


Software to more rapidly and accurately predict protein--ligand binding affinities is of high interest for early-stage drug discovery, and physics-based methods are among the most widely used technologies for this purpose. The accuracy of these methods depends critically on the accuracy of the potential functions they use. Potential functions are typically trained against a combination of quantum chemical and experimental data. However, although binding affinities are among the most important quantities to predict, experimental binding affinities have not to date been integrated into the experimental dataset used to train potential functions. In recent years, the use of host--guest complexes as simple and tractable models of binding thermodynamics has gained popularity due to their small size and simplicity, relative to protein--ligand systems. Host--guest complexes can also avoid ambiguities that arise in protein--ligand systems, such as uncertain protonation states. Thus, experimental host--guest binding data are an appealing additional data type to integrate into the experimental dataset used to optimize potential functions. Here, we report the extension of the Open Force Field Evaluator framework to enable the systematic calculation of host--guest binding free energies and their gradients with respect to force field parameters, coupled with the curation of 126 host--guest complexes with available experimental binding free energies. As an initial application of this novel infrastructure, we optimized generalized Born (GB) cavity radii for the OBC2 GB implicit solvent model against experimental data for 36 host--guest systems. This refitting led to a dramatic improvement in accuracy for both the training set and a separate test set with 90 additional host--guest systems. The optimized radii also showed encouraging transferability from host--guest systems to 59 protein-ligand systems. However, the new radii are significantly smaller than the baseline radii and lead to excessively favorable hydration free energies (HFE). Thus, users of the OBC2 GB model currently may choose between GB cavity radii that yield more accurate binding affinities or GB cavity radii that yield more accurate HFEs. We suspect that achieving good accuracy on both will require more far-reaching adjustments to the GB model. We note that binding free energy calculations using the OBC2 model in OpenMM gain about a 10x speedup relative to corresponding explicit solvent calculations, suggesting a future role for implicit solvent absolute binding free energy (ABFE) calculations in virtual compound screening. This study proves the principle of using host--guest systems to train potential functions that are transferrable to protein--ligand systems, and provides an infrastructure that enables a range of applications.


implicit solvent
explicit solvent
binding free energy
molecular dynamics

Supplementary materials

Supplementary Information
Supplementary information that includes tables summarizing the host-guest and protein-ligand binding free energies, hydration free energies, and figures of the test data set.

Supplementary weblinks


Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.