ChemRxiv
These are preliminary reports that have not been peer-reviewed. They should not be regarded as conclusive, guide clinical practice/health-related behavior, or be reported in news media as established information. For more information, please see our FAQs.
nmr.pdf (862.15 kB)

A General Protocol for the Accurate Predictions of Molecular 13C/1H NMR Chemical Shifts via Machine Learning

preprint
submitted on 02.12.2019 and posted on 10.12.2019 by Peng Gao, Jun Zhang, Qian Peng, Vassiliki-Alexandra Glezakou
Accurate prediction of NMR chemical shifts with affordable computational cost is of great importance for rigorous structural assignments of experimental studies. However, the most popular computational schemes for NMR calculation—based on density functional theory (DFT) and gauge-including atomic orbital (GIAO) methods—still suffer from ambiguities in structural assignments. Using state-of-the-art machine learning (ML) techniques, we have developed a DFT+ML model that is capable of predicting 13C/1H NMR chemical shifts of organic molecules with high accuracy. The input for this generalizable DFT+ML model contains two critical parts: one is a vector providing insights into chemical environments, which can be evaluated without knowing the exact geometry of the molecule; the other one is the DFT-calculated isotropic shielding constant. The DFT+ML model was trained with a dataset containing 476 13C and 270 1H experimental chemical shifts. For the DFT methods used here, the root-mean-square-derivations (RMSDs) for the errors between predicted and experimental 13C/1H chemical shifts are as small as 2.10/0.18 ppm, which is much lower than the typical DFT (5.54/0.25 ppm), or DFT+linear regression (4.77/0.23 ppm) approaches. It also has smaller RMSDs and maximum absolute errors than two previously reported NMR-predicting ML models. We test the robustness of the model on two classes of organic molecules (TIC10 and hyacinthacines), where we unambiguously assigned the correct isomers to the experimental ones. This DFT+ML model is a promising way of predicting NMR chemical shifts and can be easily adapted to calculated shifts for any chemical compound.

History

Email Address of Submitting Author

zhangjunqcc@gmail.com

Institution

Pacific Northwest National Laboratory

Country

United States

ORCID For Submitting Author

0000-0001-9093-9430

Declaration of Conflict of Interest

The authors declare no competing financial interests.

Exports

Logo branding

Exports