These are preliminary reports that have not been peer-reviewed. They should not be regarded as conclusive, guide clinical practice/health-related behavior, or be reported in news media as established information. For more information, please see our FAQs.
ChemRxiv-pKa-final version.pdf (1.59 MB)

Holistic Prediction of pKa in Diverse Solvents Based on Machine Learning Approach

submitted on 04.06.2020 and posted on 04.06.2020 by Qi Yang, Yao Li, Jin-Dong Yang, Yidi Liu, Long Zhang, Sanzhong Luo, Jin-Pei Cheng
The acid dissociation constant pKa dictates a molecule’s ionic status, and is a critical physicochemical property in rationalizing acid-base chemistry in solution and in many biological contexts. Although numerous theoretic approaches have been developed for predicating aqueous pKa, fast and accurate prediction of non-aqueous pKas has remained a major challenge. On the basis of iBonD experimental pKa database curated across 39 solvents, a holistic pKa prediction model was established by using machine learning approach. Structural and physical organic parameters combined descriptors (SPOC) were introduced to represent the electronic and structural features of molecules. With SPOC and ionic status labelling (ISL), the holistic models trained with neural network or XGBoost algorithm showed the best prediction performance with MAE value as low as 0.87 pKa unit. The holistic model showed better performance than all the tested single-solvent models (SSMs), verifying the transfer learning features. The capability of prediction in diverse solvents allows for a comprehensive mapping of all the possible pKa correlations between different solvents. The iBonD holistic model was validated by prediction of aqueous pKa and micro-pKa of pharmaceutical molecules and pKas of organocatalysts in DMSO and MeCN with high accuracy. An on-line prediction platform ( was constructed based on the current model.


Email Address of Submitting Author


Tsinghua University



ORCID For Submitting Author


Declaration of Conflict of Interest

no conflict of interests


Logo branding