Building Machine Learning Force Fields of Proteins with Fragment-Based Approach and Data Transfer

18 June 2021, Version 3
This content is a preprint and has not undergone peer review at the time of posting.


We combined our generalized energy-based fragmentation (GEBF) approach and transfer learning technique to construct machine learning force fields (MLFFs) for proteins only from quantum mechanics (QM) calculations of small subsystems. Using a kernel-based model called Gaussian Approximation Potential (GAP), our protocol can automatically generate training sets with high efficiency. To facilitate the construction of training sets for various proteins, a protein’s data library is created to store all data of subsystems generated from trained proteins. With this data library, for a new protein only its subsystems with new topological types are required for the construction of the corresponding training set. With two polypeptides, 4ZNN and 1XQ8 segment, as examples, we demonstrate that GEBF-MLFFs can be constructed by either kernel methods or neural network methods with full QM quality. Therefore, the present work provides an effi-cient and systematic way to build force fields for biological systems like proteins with QM accuracy.


machine learning
force field

Supplementary materials

Supporting Information
The supporting information of the new manuscript.


Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.