A General Protocol for the Accurate Predictions of Molecular 13C/1H NMR Chemical Shifts via Machine Learning


Accurate prediction of NMR chemical shifts with affordable computational cost is of great importance for rigorous structural assignments of experimental studies. However, the most popular computational schemes for NMR calculation—based on density functional theory (DFT) and gauge-including atomic orbital (GIAO) methods—still suffer from ambiguities in structural assignments. Using state-of-the-art machine learning (ML) techniques, we have developed a DFT+ML model that is capable of predicting 13C/1H NMR chemical shifts of organic molecules with high accuracy. The input for this generalizable DFT+ML model contains two critical parts: one is a vector providing insights into chemical environments, which can be evaluated without knowing the exact geometry of the molecule; the other one is the DFT-calculated isotropic shielding constant. The DFT+ML model was trained with a dataset containing 476 13C and 270 1H experimental chemical shifts. For the DFT methods used here, the root-mean-square-derivations (RMSDs) for the errors between predicted and experimental 13C/1H chemical shifts are as small as 2.10/0.18 ppm, which is much lower than the typical DFT (5.54/0.25 ppm), or DFT+linear regression (4.77/0.23 ppm) approaches. It also has smaller RMSDs and maximum absolute errors than two previously reported NMR-predicting ML models. We test the robustness of the model on two classes of organic molecules (TIC10 and hyacinthacines), where we unambiguously assigned the correct isomers to the experimental ones. This DFT+ML model is a promising way of predicting NMR chemical shifts and can be easily adapted to calculated shifts for any chemical compound.