Abstract
Accurate prediction of micro-pKa values is crucial for understanding and modulating the acidity and basicity of organic molecules, with applications in drug discovery, materials science, and environmental chemistry. This work introduces QupKake, a novel workflow that combines graph neural network (GNN) models with semiempirical quantum mechanical (QM) features to achieve exceptional accu- racy and generalization in micro-pKa prediction. QupKake outperforms state-of-the-art models on a variety of benchmark datasets, with root mean square errors (RMSEs) between 0.5-0.8 pKa units on five external test sets. Feature importance analysis reveals the crucial role of QM features in both the reaction site enumeration and micro-pKa prediction models. QupKake represents a significant advancement in micro-pKa prediction, offering a powerful tool for various applications in chemistry and beyond.
Supplementary materials
Title
Supplementary Information
Description
Molecular descriptors for the initial training set, experimental training set, test sets, protonation and deprotonation differences with SMARTS patterns, graph features used in the model, feature importance rankings, similarity scores, best and worst predictions in test sets, and parallel performance of the model.
Actions