QM Assisted ML for 19F NMR Chemical Shift Prediction

07 August 2023, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Ligand-observed 19F NMR detection is an efficient method for screening libraries of fluorinated molecules in fragment-based drug design campaigns. Screening fluorinated molecules in large mixtures makes 19F NMR a high-throughput method. Typically, these mixtures are generated from pools of well-characterized fragments. By predicting 19F NMR chemical shift, mixtures could be generated for arbitrary fluorinated molecules facilitating for example focused screens. In a previous publication, we introduced a method to predict 19F NMR chemical shift using rooted fluorine fingerprints and machine learning (ML) methods. Having observed that the quality of the prediction depends on similarity to the training set, we here propose to assist the prediction with quantum mechanics (QM) based methods in cases where compounds are not well covered by a training set. Beyond similarity, the performance of ML methods could be associated with individual features in compounds. A combination of both could be used as a procedure to split input data sets into those that could be predicted by ML and those that required QM processing. We could show on a proprietary fluorinated fragment library, known as LEF (Local Environment of Fluorine), and a public Enamine data set of 19F NMR chemical shifts that ML and QM methods could synergize to outperform either method individually. Models built on Enamine data, as well as model building and QM workflow tools, can be found at https://github.com/PatrickPenner/lefshift and https://github.com/PatrickPenner/lefqm.

Keywords

Fragment-based drug design
Fragment-based screening
19F NMR
Machine Learning
Quantum Chemistry

Supplementary materials

Title
Description
Actions
Title
Supporting Information
Description
Supporting information on the data sets, machine learning parameters and performance, the QM setup, and the results.
Actions
Title
Data
Description
Collection of Enamine training data, test data, QM, and ML results as CSV files.
Actions

Supplementary weblinks

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.