Abstract
Machine learning (ML) and artificial intelligence (AI) techniques are transforming the way chemical reactions are studied today. Valuable datasets from high-throughput experimentation (HTE) are increasingly being generated to better understand reaction conditions that are crucial for outcomes such as yields and selectivities. However, it is often overlooked that data from such designed experiments possess a very specific structure, which can be captured by appropriate statistical models. Ignoring these underlying data structures when applying ML/AI algorithms can result in completely misleading conclusions. In contrast, leveraging knowledge about the data-generating process together with suitable estimation approaches yields reliable, interpretable, and comprehensive insights into the chemical reaction mechanisms. A particularly complex dataset is available for the Buchwald-Hartwig amination. Using this dataset, an appropriate statistical model for HTE-generated chemical data is introduced, and a suitable parameter estimation algorithm is developed. Based on the estimated model, new insights into the Buchwald-Hartwig amination are thoroughly discussed. Our approach is directly applicable to a wide range of HTE-generated data for chemical reactions and beyond
Supplementary materials
Title
Supplementary material for Modelling and estimation of chemical reaction yields from high-throughput experiments
Description
R code, Continuous Bernoulli distribution, Matrix of descriptors, Combinatorics of ANOVA with single replicates, Four-way ANOVA with single replicates, Algorithm
Actions