Widespread misinterpretation of pKa terminology and its consequences

08 August 2024, Version 2
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

The acid dissociation constant (pK a), which quantifies the propensity for a solute to donate a proton to its solvent, is crucial for drug design and synthesis, environmental fate studies, chemical manufacturing, and many other fields. Unfortunately, the terminology used for describing acid base phenomena is inconsistent, causing large potential for misinterpretation. In this work, we examine a systematic confusion underlying the definition of “acidic” and “basic” pKa values for zwitterionic compounds. Due to this confusion, some pKa data is misrepresented in data repositories, including the widely- used and highly trusted ChEMBL Database. Such datasets are widely used to supply training data for pKa prediction models, and hence, confusion and errors in the data makes model performance worse. Herein, we discuss the intricacies of this issue. We make suggestions for describing acid-base phenomena, training pKa prediction models, and stewarding pKa datasets, given the high potential for confusion and potentially high impact of accurately describing acid-base phenomena.

Keywords

pka
cheminformatics
chembl
qupkake
acid dissociation
pkah
acidity
organic
solvation
basicity
terminology
data quality

Supplementary materials

Title
Description
Actions
Title
Data used in this study
Description
- iupac_chembl_overlap.csv: .csv containing experimental pKa data with SMILES, along with their corresponding ChEMBL calculations - iupac_chembl_qupkake_downsampled.csv: .csv containing experimental pKa data with SMILES, along with both ChEMBL and QupKake calculations for a smaller subset of data - glycine_solubility_needham.csv: solubility data
Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.