Abstract
While hundreds of thousands of new chemical reactions are reported annually, efficient use of this vast collection of synthetic knowledge remains a persistent challenge in modern chemistry. Recent applications of large language models (LLMs) have shown promise, but systems that reliably work for de novo compounds and molecular transformations have remained elusive. Here we introduce MOSAIC (Multiple Optimized Specialists for AI-Driven Chemical Prediction), a computational framework that enables chemists to harness the collective knowledge of millions of reaction protocols. In contrast to existing approaches relying on agentic models, MOSAIC leverages the open-source Llama3.1-8B-instruct architecture. By training 2,489 specialized chemical experts on Voronoi-clustered reaction spaces, we establish a scalable paradigm that delivers reproducible and human-readable experimental protocols for complex syntheses. Experimental validation demonstrates MOSAIC's ability to predict and execute previously unreported transformations, including challenging reactions via Buchwald-Hartwig amination, Suzuki coupling, and olefin metathesis. We validate this approach through the successful synthesis of over 35 novel compounds spanning pharmaceuticals, materials, agrochemicals, and cosmetics. This framework establishes a new relationship between computational and experimental chemistry, providing a foundation for accelerated chemical discovery across disciplines.
Supplementary materials
Title
Supplementary Information
Description
The Supplementary Information contains detailed computational and experimental data such as training logs, spectra and procedures.
Actions