Collective Intelligence of Specialized Language Models Guides Realization of de novo Chemical Synthesis

31 January 2025, Version 1

Abstract

While hundreds of thousands of new chemical reactions are reported annually, efficient use of this vast collection of synthetic knowledge remains a persistent challenge in modern chemistry. Recent applications of large language models (LLMs) have shown promise, but systems that reliably work for de novo compounds and molecular transformations have remained elusive. Here we introduce MOSAIC (Multiple Optimized Specialists for AI-Driven Chemical Prediction), a computational framework that enables chemists to harness the collective knowledge of millions of reaction protocols. In contrast to existing approaches relying on agentic models, MOSAIC leverages the open-source Llama3.1-8B-instruct architecture. By training 2,489 specialized chemical experts on Voronoi-clustered reaction spaces, we establish a scalable paradigm that delivers reproducible and human-readable experimental protocols for complex syntheses. Experimental validation demonstrates MOSAIC's ability to predict and execute previously unreported transformations, including challenging reactions via Buchwald-Hartwig amination, Suzuki coupling, and olefin metathesis. We validate this approach through the successful synthesis of over 35 novel compounds spanning pharmaceuticals, materials, agrochemicals, and cosmetics. This framework establishes a new relationship between computational and experimental chemistry, providing a foundation for accelerated chemical discovery across disciplines.

Keywords

Large Language Models
Collective Chemical Intelligence
Reaction Development
Organic Synthesis and Reaction

Supplementary materials

Title
Description
Actions
Title
Supplementary Information
Description
The Supplementary Information contains detailed computational and experimental data such as training logs, spectra and procedures.
Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.