Catalysis

Equipping data-driven experiment planning for Self-driving Laboratories with semantic memory: case studies of transfer learning in chemical reaction optimization

Authors

Abstract

Optimization strategies based on machine learning (ML), such as Bayesian optimization, show promise across the experimental sciences as a superior alternative to traditional design of experiment. Deploying ML optimization tools in R\&D operations increases productivity and efficiency, while reducing the time and cost necessary to identify new molecules, materials, and process parameters with desired target properties. Additional benefits can be captured when combining these ML algorithms with automated laboratory equipment with Atinary’s orchestration software platform SDLabs. The synergy of these technologies are referred to as Self-driving Laboratories, which hold the potential to revolutionize scientific experimentation, data collection, and materials discovery. Thus far, however, autonomous experimentation projects have not fully leveraged pre-existing knowledge and databases, often beginning from scratch and sequentially collecting measurements from new experiments. This is in stark contrast to experimentation by humans, where trained experts rely on intuition acquired from experience to select initial parameter settings for a novel experiment. In this work, we introduce Atinary’s transfer learning algorithm SeMOptt, a general-purpose Bayesian optimization framework which uses meta-/few-shot learning to efficiently transfer knowledge from related historical experiments and databases to a novel experimental campaign via a compound acquisition function. We apply SeMOpt to chemical reaction optimization, an important and challenging task in chemistry. Specifically, we perform two case studies: i) the optimization of five simulated cross-coupling reactions, which demonstrates the ability of our approach to adapt to data with unknown effects, such as the presence of a side reaction, catalyst deactivation, and measurement noise; ii) the optimization of palladium-catalyzed Buchwald-Hartwig cross-coupling of aryl halides with 4-methylaniline in the presence of potentially inhibitory additives. We find that SeMOpt accelerates the optimization rate by a factor of 10 or more compared to standard single-task ML optimizers (those without transfer learning capabilities to leverage historical experiments or databases). Moreover, these case studies show that \semopt outperforms several existing ML Bayesian optimization strategies that leverage historical data. Thus, we believe this work presents a valuable technical contribution for general-purpose optimization and makes the case to replace the traditional trial-and-error experimentation process with Self-driving Labs augmented with semantic memory.

Content

Thumbnail image of semopt_chemrxiv_3.pdf