Data-Driven Many-Body Simulations of Biomolecules with CCSD(T) Accuracy: I. Polyalanine in the Gas Phase

25 March 2025, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

A predictive understanding of how proteins fold, misfold, and stabilize requires accurate molecular-level insights into the thermodynamic and kinetic forces shaping their backbones. While empirical force fields remain the workhorse of biomolecular simulations, their limited functional forms often fall short in capturing the complex many-body interactions that govern protein dynamics. Quantum-mechanical methods, on the other hand, offer high accuracy but are prohibitively expensive for large biomolecules. In this work, we introduce a generalized, intramolecular formulation of the data-driven many-body MB-nrg formalism that achieves “gold standard” coupled cluster accuracy in simulating polyalanine chains in the gas phase. By decomposing polyalanine chains into chemically intuitive building blocks, we develop modular and transferable potential energy functions that accurately reproduce reference energies, normal-mode harmonic frequencies, and conformational free-energy landscapes. Compared to state-of-the-art force fields, the MB-nrg potential energy function yields a smoother and more realistic free energy surface, captures transient structural motifs missed by empirical force fields, and enables flexible sampling of secondary structure transitions in longer peptides. This work paves the way for "gold standard" coupled cluster-level simulations of proteins under physiologically relevant conditions, bridging the gap between chemical accuracy and biological complexity.

Keywords

molecular interactions
biomolecules
peptides
data-driven potential energy functions
many-body expansion
machine learning
protein folding

Supplementary materials

Title
Description
Actions
Title
Supporting Information
Description
Details on the composition of the n-body permutationally invariant polynomials and training sets, description of the MB-nrg parameters, and additional correlation plots between the DLPNO-CCSD(T) reference n-body energies and corresponding MB-nrg values.
Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.