Investigating errors in alchemical free energy predictions using random forest models and GaMD

27 March 2025, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

State of the art in silico ∆∆G predictions for antibody-antigen complexes achieve an accuracy of ±1 kcal/mol. While this is sufficient for high throughput screening or affinity maturation, it is insufficient for assessing the criticality of post-translational modifications (PTMs) during clinical development. PTMs that impair binding by >50% pose a major risk to achieving the desired therapeutic bioactivity and must be controlled within defined limits to ensure product quality. A 50% loss in the dissociation constant (Kd) corresponds to a ∆∆G of +0.5 kcal/mol, thus requiring a ±0.5 kcal/mol accuracy threshold for in silico predictions to be practically actionable in clinical phases. In this work, we use conventional molecular dynamics thermodynamic integration (cMD-TI) to generate ∆∆G predictions and develop an error analysis approach using random forest models and end state Gaussian accelerated molecular dynamics (GaMD). This approach provides insight into inadequate sampling of key degrees of freedom (DOF) using only cMD-TI and end state GaMD. We identify bulky side chain undersampling and violation of energetically relevant interatomic interactions as major sources of error, and our GaMD-based error corrections lead to > 1 kcal/mol improvements in accuracy in our most erroneous cases. When applied to a set of 13 predictions, the GaMD-based error correction reduced the root mean square error (RMSE) from 1.01 kcal/mol to 0.69 kcal/mol. This work introduces the application of alchemical free energy predictions to estimating PTM impacts on bioactivity and addresses the current errors that limit their practical use in clinical development.

Keywords

∆∆G prediction
ddG prediction
relative binding free energy prediction
RBFE prediction
alchemical binding free energy prediction
alchemical free energy prediction
antibody-antigen complexes
antibody-antigen binding free energy prediction
protein-protein interactions
binding affinity prediction
alchemical free energy errors
random forest models
post translational modifications
critical quality attribute analysis
hydrogen bonds
thermodynamic integration
charge-changing perturbations
NMR restraints
salt bridge disruption
alchemical free energy salt bridges
sampling errors in alchemical free energy predictions
post-translational modifications
PTMs
thermodynamic integration error
gaussian accelerated molecular dynamics
GaMD
random forest models with thermodynamic integration

Supplementary materials

Title
Description
Actions
Title
Supplementary Materials
Description
Full empirical dataset, full TI predictions dataset, results from one-step corrections, explanation of key features referenced in the main text, barstar-barnase GaMD 1-D free energy profiles indicating possible salt bridges or hydrogen bonds (referenced in the Discussion section), random forest + GaMD error analysis applied to all 13 hu4D5-5 systems.
Actions

Supplementary weblinks

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.