Abstract
A predictive understanding of how proteins fold, misfold, and stabilize requires accurate molecular-level insights into the thermodynamic and kinetic forces shaping their backbones. While empirical force fields remain the workhorse of biomolecular simulations, their limited functional forms often fall short in capturing the complex many-body interactions that govern protein dynamics. Quantum-mechanical methods, on the other hand, offer high accuracy but are prohibitively expensive for large biomolecules. In this work, we introduce a generalized, intramolecular formulation of the data-driven many-body MB-nrg formalism that approaches “gold standard” coupled cluster accuracy in simulating polyalanine chains in the gas phase. By decomposing polyalanines into chemically intuitive building blocks, we develop modular and transferable potential energy functions that accurately reproduce reference energies, normal-mode harmonic frequencies, and conformational free-energy landscapes. Compared to empirical force fields commonly used in biosimulations, the MB-nrg potential energy function yields a smoother and more physically grounded free-energy surface, captures transient structural motifs underrepresented by empirical force fields, and enables flexible sampling of secondary structure transitions in longer peptides. This work establishes a foundation for ex- tending coupled-cluster-level modeling to larger biomolecular systems under physiologically relevant conditions, while highlighting the methodological challenges that remain in achieving consistent accuracy at scale.
Supplementary materials
Title
Supporting Information
Description
Additional details about the composition of the n-body permutationally invariant polynomials
and training sets, description of the MB-nrg parameters, additional correlation plots between the
DLPNO-CCSD(T) reference energies and corresponding MB-nrg values for n-bodies and alanine
tripeptide, and sample calculation of n-body energies from connected monomers.
Actions