Abstract
The discovery of novel reactions and optimization of reaction conditions are fundamental challenges in organic synthesis, with significant implications for retrosynthetic analysis and condition selection. This work proposes a data-driven strategy for reaction discovery, integrating high-throughput experimentation (HTE) with insights derived from large language models (LLMs). By leveraging LLMs to process chemical information from extensive literature, the method enables hypothesis-driven design and experimental validation, minimizing reliance on serendipity.
Taking cross-electrophile coupling (XEC) as a case study, this research extracts key trends, substrate combinations, and reaction conditions from 520 relevant publications. The methodology identifies unexplored substrate pairs and designs reaction plates for HTE, facilitating systematic discovery. Additionally, the concept of directed evolution in chemical catalysis is explored, hypothesizing that catalytic conditions can evolve systematically based on structural and reactivity similarities.
The findings demonstrate the utility of combining LLMs with HTE for reaction discovery and catalysis research. This approach emphasizes methodology development, prioritizing the generation of hypotheses and protocols over isolated reaction discoveries, offering a scalable framework for advancing chemical innovation.
Supplementary materials
Title
Supporting Information
Description
Full prompt, additional visualization analysis of conditions and reactivity information extracted
Actions