Abstract
The
acid dissociation constant pKa
dictates a molecule’s ionic status, and is a critical physicochemical property
in rationalizing acid-base chemistry in solution and in many biological
contexts. Although numerous theoretic approaches have been developed for
predicating aqueous pKa, fast
and accurate prediction of non-aqueous pKas
has remained a major challenge. On the basis of iBonD experimental pKa
database curated across 39 solvents, a holistic pKa prediction model was established by using machine
learning approach. Structural and physical organic parameters combined
descriptors (SPOC) were introduced to represent the electronic and structural
features of molecules. With SPOC and ionic status labelling (ISL), the holistic models trained with neural network or XGBoost algorithm
showed the best prediction performance with MAE value as
low as 0.87 pKa unit. The
holistic model showed better performance than all the tested single-solvent
models (SSMs), verifying the transfer learning features. The capability of
prediction in diverse solvents allows for a comprehensive mapping of all the
possible pKa correlations
between different solvents. The iBonD
holistic model was validated by prediction of aqueous pKa and micro-pKa
of pharmaceutical molecules and pKas
of organocatalysts in DMSO and MeCN with high accuracy. An on-line prediction platform
(http://pka.luoszgroup.com) was constructed based on the current model.