The Challenge of Balancing Model Sensitivity and Robustness in Predicting Yields: A Benchmarking Study of Amide Coupling Reactions

28 July 2023, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Accurate prediction of reaction yield is the holy grail for computer-assisted synthesis, but current models have failed to generalize to large literature datasets. To understand the causes and inspire future design, we systematically benchmarked the yield prediction task. We carefully curated and augmented a literature dataset of 41,239 amide coupling reactions, each with information on reactants, products, intermediates, yields, and reaction contexts, and provided 3D structures for the molecules. We calculated molecular features related to 2D and 3D structure information, as well as physical and electronic properties. These descriptors were paired with 4 categories of machine learning methods (linear, kernel, ensemble, and neural network), yielding valuable benchmarks about feature and model performance. Despite the excellent performance on a high-throughput experiment (HTE) dataset (R2 around 0.9), no method gave satisfying results on the literature data. The best performance was an R2 of 0.395 ± 0.020 using stack technique. Error analysis revealed that reactivity cliff and yield uncertainty are the main reasons for incorrect predictions. Removing reactivity cliffs and uncertain reactions boosted the R2 to 0.457 ± 0.006. These results highlight that yield prediction models must be sensitive to the reactivity change due to the subtle structure variance, as well as be robust to the uncertainty associated with yield measurements.

Keywords

amide coupling reaction
reaction yield
machine learning

Supplementary materials

Title
Description
Actions
Title
Supporting Information 1
Description
Supplementary table, figures and methods
Actions
Title
Supporting Information 2
Description
Examples of reactivity cliffs and uncertain reactions
Actions

Supplementary weblinks

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.