A machine learning approach to model interaction effects: development and application to alcohol deoxyfluorination

09 December 2022, Version 1
This content is a preprint and has not undergone peer review at the time of posting.


The application of machine learning (ML) techniques to model high-throughput experimentation (HTE) datasets has seen a recent rise in popularity. Nevertheless, the ability to model the interplay between reaction components, known as interaction effects, with ML remains an outstanding challenge. Using a simulated HTE dataset, we find that the presence of irrelevant features poses a strong obstacle to learning interaction effects with common ML algorithms. To address this problem, we propose a two-part statistical modeling approach for HTE datasets: classical analysis of variance (ANOVA) of the experiment to identify systematic effects that impact reaction yield across the experiment, followed by regression of individual effects using chemistry-informed features. To illustrate this methodology, we use our previously published alcohol deoxyfluorination dataset comprising 740 reactions to build compact, interpretable regression models that account for each significant effect observed in the dataset. We achieve a sizeable performance boost compared to our previously published Random Forest model, reducing mean absolute error (MAE) from 18.1% to 13.4% and root mean squared error (RMSE) from 21.7% to 16.5% on a newly generated test set. Finally, we demonstrate that this approach can facilitate the generation of new mechanistic hypotheses which, when probed experimentally, can lead to a deeper understanding of chemical reactivity.


high-throughput experimentation
ML prediction

Supplementary materials

Supporting Information
Experimental procedures, experimental data, and characterization and spectral data (PDF)


Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.