Protein pKa prediction with machine learning

04 October 2021, Version 2
This content is a preprint and has not undergone peer review at the time of posting.


Protein pKa prediction is essential for the investigation of pH-associated relationship between protein structure and function. In this work, we introduce a deep learning based protein pKa predictor DeepKa, which is trained and validated with the pKa values derived from continuous constant pH molecular dynamics (CpHMD) simulations of 279 soluble proteins. Here the CpHMD implemented in the Amber molecular dynamics package has been employed (Huang, Harris, and Shen J. Chem. Inf. Model. 2018, 58, 1372-1383). Notably, to avoid discontinuities at the boundary, grid charges are proposed to represent protein electrostatics. We show that the prediction accuracy by DeepKa is close to that by CpHMD benchmarking simulations, validating DeepKa as an efficient protein pKa predictor. In addition, the training and validation sets created in this study can be applied to the development of machine learning based protein pKa predictors in future. Finally, the grid charge representation is general and applicable to other topics, such as the protein-ligand binding affinity prediction.


Constant pH molecular dynamics
pKa prediction
Deep learning
charge spreading
protein pka database

Supplementary materials

Supporting information
Supplemental figures and tables including the statistics of the training and test datasets, the distribution of solvent accessible surface areas of residues in the training dataset, the correlation plots by Propka and the pKa's convergence analysis of CpHMD simulations.


Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.