Theoretical and Computational Chemistry

Building Machine Learning Force Fields of Proteins with Fragment-Based Approach and Transfer Learning



Molecular dynamic (MD) simulation plays an essential role in understanding protein functions at atomic level. At present, MD simulations on proteins are mainly based on classical force fields. However, the accuracy of classical force fields for proteins is still insufficient for accurate descriptions of their structures and dynamical properties. Here we present a novel protocol to construct machine learning force field (MLFF) for a given protein with full quantum mechanics (QM) accuracy. In this protocol, the energy of the target system is obtained by fitting energies of its various subsystems constructed with the generalized energy-based fragmentation (GEBF) approach. To facilitate the construction of MLFF for various proteins, a protein’s data library is created to store all data of subsystems generated from trained proteins. With this protein’s data library, for a new protein only its subsystems with new topological types are required for the construction of the corresponding MLFF. This protocol is illustrated with two polypeptides, 4ZNN and 1XQ8 segment, as examples. The energies and forces predicted from this MLFF are in good agreement with those from density functional theory calculations, and dihedral angle distributions from GEBF-MLFF MD simulations can also well reproduce those from ab initio MD simulations. Therefore, this GEBF-ML protocol is expected to be an efficient and systematic way to build force fields for proteins and other biological systems with QM accuracy.

Version notes



Thumbnail image of GEBF-ML-2021-5-6.main.pdf

Supplementary material

Thumbnail image of