Theoretical and Computational Chemistry

Building Machine Learning Force Fields of Proteins with Fragment-Based Approach and Data Transfer



We combined our generalized energy-based fragmentation (GEBF) approach and transfer learning technique to construct machine learning force fields (MLFFs) for proteins only from quantum mechanics (QM) calculations of small subsystems. Using a kernel-based model called Gaussian Approximation Potential (GAP), our protocol can automatically generate training sets with high efficiency. To facilitate the construction of training sets for various proteins, a protein’s data library is created to store all data of subsystems generated from trained proteins. With this data library, for a new protein only its subsystems with new topological types are required for the construction of the corresponding training set. With two polypeptides, 4ZNN and 1XQ8 segment, as examples, we demonstrate that GEBF-MLFFs can be constructed by either kernel methods or neural network methods with full QM quality. Therefore, the present work provides an effi-cient and systematic way to build force fields for biological systems like proteins with QM accuracy.

Version notes

In this new manuscript, we added the results of GEBF-MLFFs constructed with Neural Network (NN) method.


Thumbnail image of GEBF-ML-2021-6-17.main.pdf

Supplementary material

Thumbnail image of
Supporting Information
The supporting information of the new manuscript.