Equipping data-driven experiment planning for Self-driving Laboratories with semantic memory: case studies of transfer learning in chemical reaction optimization

10 May 2022, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Optimization strategies based on machine learning (ML), such as Bayesian optimization, show promise across the experimental sciences as a superior alternative to traditional design of experiment. Deploying ML optimization tools in R\&D operations increases productivity and efficiency, while reducing the time and cost necessary to identify new molecules, materials, and process parameters with desired target properties. Additional benefits can be captured when combining these ML algorithms with automated laboratory equipment with Atinary’s orchestration software platform SDLabs. The synergy of these technologies are referred to as Self-driving Laboratories, which hold the potential to revolutionize scientific experimentation, data collection, and materials discovery. Thus far, however, autonomous experimentation projects have not fully leveraged pre-existing knowledge and databases, often beginning from scratch and sequentially collecting measurements from new experiments. This is in stark contrast to experimentation by humans, where trained experts rely on intuition acquired from experience to select initial parameter settings for a novel experiment. In this work, we introduce Atinary’s transfer learning algorithm SeMOptt, a general-purpose Bayesian optimization framework which uses meta-/few-shot learning to efficiently transfer knowledge from related historical experiments and databases to a novel experimental campaign via a compound acquisition function. We apply SeMOpt to chemical reaction optimization, an important and challenging task in chemistry. Specifically, we perform two case studies: i) the optimization of five simulated cross-coupling reactions, which demonstrates the ability of our approach to adapt to data with unknown effects, such as the presence of a side reaction, catalyst deactivation, and measurement noise; ii) the optimization of palladium-catalyzed Buchwald-Hartwig cross-coupling of aryl halides with 4-methylaniline in the presence of potentially inhibitory additives. We find that SeMOpt accelerates the optimization rate by a factor of 10 or more compared to standard single-task ML optimizers (those without transfer learning capabilities to leverage historical experiments or databases). Moreover, these case studies show that \semopt outperforms several existing ML Bayesian optimization strategies that leverage historical data. Thus, we believe this work presents a valuable technical contribution for general-purpose optimization and makes the case to replace the traditional trial-and-error experimentation process with Self-driving Labs augmented with semantic memory.

Keywords

machine learning
Bayesian optimization
reaction optimization
organic chemistry
catalysis

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.