Evaluation of machine learning models for the accelerated prediction of Density Functional Theory calculated 19F chemical shifts based on local atomic environments

31 July 2024, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

The introduction of fluorine in compounds plays a crucial role in drug development as it greatly influences their final pharmacokinetic and dynamic properties. Due to the increasing prevalence of fluorine in FDA-approved drugs in recent years, identifying the underlying mechanisms driving their chemical transformations has become crucial in the drug discovery landscape. 19F NMR spectroscopy is a powerful analytical technique that allows for the examination of fluorine-containing compounds, offering valuable information about their structure, dynamics, and reactivity. Consequently, this technique has become a cornerstone in the mechanistic evaluation of fluorine-containing chemical transformations. NMR spectra can be interpreted through the leveraging of Density Functional Theory (DFT), an ab initio modeling method that can be harnessed for the prediction of NMR chemical shifts. However, the screening of compounds and discovery of feasible drug candidates is severely limited due to the computational cost of DFT. Here, we present a machine learning approach to accelerate the prediction of DFT-calculated 19F NMR chemical shifts. The fluorine atoms’ features in the models were derived from their local three-dimensional structural environments, representing their neighboring atoms within a radius of n Å away from the given fluorine atom in the compound. A comparative analysis of thirteen regression models was conducted using features extracted from 501 fluorinated compounds in our laboratory’s chemical inventory. The target chemical shift values were calculated using DFT with the quantum chemistry software ORCA. Among the models evaluated, Gradient Boosting Regression (GBR) exhibited the highest performance, achieving a mean absolute error of 2.89 ppm - 3.73 ppm with a local environment radius of 3 Å. This demonstrates a comparable accuracy to DFT calculations while significantly reducing computational time per compound from several hundred seconds to milliseconds. 3 Å was also found to be the most optimal radius across all models when encoding features for local atomic environments.

Keywords

Machine Learning
19F NMR Spectroscopy
Chemical Shifts
Density Functional Theory
Gradient Booster Regressor
Fluorine

Supplementary materials

Title
Description
Actions
Title
Supplementary Information for “Evaluation of machine learning models for the accelerated prediction of Density Functional Theory calculated 19F chemical shifts based on local atomic environments”
Description
Data documentation and source code for results disclosed in the manuscript.
Actions

Supplementary weblinks

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.