Abstract
We introduce a novel workflow integrating reasoning-capable language models with specialized chemical analysis tools to enhance molecular structure determination using nuclear magnetic resonance spectroscopy. Generally, structure elucidation involves generating candidate molecular structures, comparing their predicted spectral features to experimental data, and identifying the best-fitting structure. Our workflow systematically generates diverse molecular candidates through chemical synthesis predictions, regioisomer exploration, and direct spectral-based methods. The language model bridges the gap between quantitative data and chemical insight by evaluating candidates through a reasoning process that analyzes spectral evidence, explains discrepancies, and assesses overall structural plausibility, moving beyond simple numerical error. This LLM-driven reasoning stage proved crucial, increasing correct top-ranked structure identification accuracy by 26.4%. Simulated spectral data with introduced noise artifacts and solvent peaks further highlighted the robustness of our method, showing accuracy improvements by 35.3%. The language model's confidence scores effectively correlated with prediction accuracy, facilitating efficient triage of results. While currently focused on HSQC data, this framework offers a flexible foundation for next-generation structure elucidation tools combining chemical expertise with advanced reasoning capabilities.
Supplementary weblinks
Title
ChemStructLLM: Enhancing Molecular Structure Elucidation with Reasoning-Capable LLMs
Description
We introduce a novel workflow integrating reasoning-capable language models with specialized chemical analysis tools to enhance molecular structure determination using nuclear magnetic resonance spectroscopy. Our framework combines:
Diverse Candidate Generation: Using Chemformer, Mol2Mol, and MultiModalSpectralTransformer (MMST) approaches
Quantitative Analysis: HSQC peak matching and spectral prediction
LLM-Driven Reasoning: Advanced interpretation of spectral evidence with chemical context
This integrated approach significantly improves structure elucidation accuracy, particularly for noisy or ambiguous spectral data.
Actions
View