Enhancing Molecular Structure Elucidation with Reasoning-Capable LLMs

Martin Priessner; Anna Tomberg; Jon Paul Janet; Richard Lewis; Magnus Johansson; Jonathan Goodman

doi:10.26434/chemrxiv-2025-8l64w

Abstract

We introduce a novel workflow integrating reasoning-capable language models with specialized chemical analysis tools to enhance molecular structure determination using nuclear magnetic resonance spectroscopy. Generally, structure elucidation involves generating candidate molecular structures, comparing their predicted spectral features to experimental data, and identifying the best-fitting structure. Our workflow systematically generates diverse molecular candidates through chemical synthesis predictions, regioisomer exploration, and direct spectral-based methods. The language model bridges the gap between quantitative data and chemical insight by evaluating candidates through a reasoning process that analyzes spectral evidence, explains discrepancies, and assesses overall structural plausibility, moving beyond simple numerical error. This LLM-driven reasoning stage proved crucial, increasing correct top-ranked structure identification accuracy by 26.4%. Simulated spectral data with introduced noise artifacts and solvent peaks further highlighted the robustness of our method, showing accuracy improvements by 35.3%. The language model's confidence scores effectively correlated with prediction accuracy, facilitating efficient triage of results. While currently focused on HSQC data, this framework offers a flexible foundation for next-generation structure elucidation tools combining chemical expertise with advanced reasoning capabilities.

Keywords

Molecular Structure Elucidation

NMR Spectroscopy

Large Language Models (LLMs)

Computer-Assisted Structure Elucidation (CASE)

Explainable AI (XAI)

HSQC Spectroscopy

Supplementary weblinks

Title

Description

Actions

Title

ChemStructLLM: Enhancing Molecular Structure Elucidation with Reasoning-Capable LLMs

Description

We introduce a novel workflow integrating reasoning-capable language models with specialized chemical analysis tools to enhance molecular structure determination using nuclear magnetic resonance spectroscopy. Our framework combines: Diverse Candidate Generation: Using Chemformer, Mol2Mol, and MultiModalSpectralTransformer (MMST) approaches Quantitative Analysis: HSQC peak matching and spectral prediction LLM-Driven Reasoning: Advanced interpretation of spectral evidence with chemical context This integrated approach significantly improves structure elucidation accuracy, particularly for noisy or ambiguous spectral data.

Actions

View

Enhancing Molecular Structure Elucidation with Reasoning-Capable LLMs

Abstract

Keywords

Supplementary weblinks

Comments

Version History

Metrics

License

DOI

Author’s competing interest statement

Ethics

Share

Enhancing Molecular Structure Elucidation with Reasoning-Capable LLMs

Authors

Abstract

Keywords

Supplementary weblinks

Comments

Version History

Metrics

License

DOI

Author’s competing interest statement

Ethics

Share