Holistic Prediction of pKa in Diverse Solvents Based on Machine Learning Approach

04 June 2020, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

The acid dissociation constant pKa dictates a molecule’s ionic status, and is a critical physicochemical property in rationalizing acid-base chemistry in solution and in many biological contexts. Although numerous theoretic approaches have been developed for predicating aqueous pKa, fast and accurate prediction of non-aqueous pKas has remained a major challenge. On the basis of iBonD experimental pKa database curated across 39 solvents, a holistic pKa prediction model was established by using machine learning approach. Structural and physical organic parameters combined descriptors (SPOC) were introduced to represent the electronic and structural features of molecules. With SPOC and ionic status labelling (ISL), the holistic models trained with neural network or XGBoost algorithm showed the best prediction performance with MAE value as low as 0.87 pKa unit. The holistic model showed better performance than all the tested single-solvent models (SSMs), verifying the transfer learning features. The capability of prediction in diverse solvents allows for a comprehensive mapping of all the possible pKa correlations between different solvents. The iBonD holistic model was validated by prediction of aqueous pKa and micro-pKa of pharmaceutical molecules and pKas of organocatalysts in DMSO and MeCN with high accuracy. An on-line prediction platform (http://pka.luoszgroup.com) was constructed based on the current model.

Keywords

pKa predication
machine learning
iBonD
XGBoost
neutral network
organocatalyst

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.