Accurate and Rapid Prediction of pKa of Transition Metal Complexes: Semiempirical Quantum Chemistry with a DataAugmented Approach

14 October 2020, Version 1
This content is a preprint and has not undergone peer review at the time of posting.


Rapid and accurate prediction of reactivity descriptors of transition metal (TM) complexes is a major challenge for contemporary quantum chemistry. Recently developed GFN2-xTB method based on the density functional tight-binding theory (DFT-B) is suitable for high-throughput calculation of geometries and thermochemistry for TM complexes albeit with a moderate accuracy. Herein we present a data-augmented approach to improve substantially the accuracy of GFN2-xTB for the prediction of thermochemical properties using pKa values of TM hydrides as a representative model example. We constructed a comprehensive database for ca. 200 TM hydride complexes featuring the experimentally measured pKa’s as well as the GFN2-xTB optimized geometries and various computed electronic and energetic descriptors. The GFN2-xTB results were further refined and validated by DFT calculations with the hybrid PBE0 functional. Our results show that although the GFN2-xTB performs well in most cases, it fails to adequately desribe TM complexes featuring multicarbonyl and multihydride ligand environments. The dataset was analyzed with the partial least squares (OLS) fitting and was used to construct an automated machine learning (AutoML) approach for the rapid estimation of pKa of TM hydride complexes. The results obtained show a high predictive power of the very fast AutoML model (RMSE ~ 2.7) comparable to that of the much slower DFT calculations (RMSE ~ 3). The presented data-augmented quantum chemistry-based approach is promising for high-throughput computational screening workflows of homogeneous TM-based catalysts.


machine Learning Methods Enable Predictive Modeling
semiempiric quantum chemical calculation methods
pKa values
Transition metal complexes


Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.