Protein pKa prediction with machine learning

29 July 2021, Version 1
This content is a preprint and has not undergone peer review at the time of posting.


Protein pKa prediction is essential for the investigation of pH-associated relationship between protein structure and function. In this work, we introduce a deep learning based protein pKa predictor DeepKa, which is trained with 12809 pKa's derived from continuous constant pH molecular dynamics (CpHMD) simulations of 279 soluble proteins. Here the CpHMD implemented in the Amber molecular dynamics package has been employed (Huang, Harris, and Shen J. Chem. Inf. Model. 2018, 58, 1372-1383). Notably, to deal with the finite-size effect, grid charges are proposed to represent electrostatics, but rather the previously used atomic charges. We show that the prediction accuracy by DeepKa is close to that by CpHMD benchmarking simulations, validating DeepKa as an efficient protein pKa predictor. In addition, the training dataset created in this study can be applied to the development of machine learning based protein pKa predictors in future. Finally, the new grid charge representation is general and applicable to other topics, like the protein-ligand binding affinity prediction.


Constant pH molecular dynamics
pKa prediction
Deep learning
charge spreading
protein pka database


Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.