Enhancing Molecular Structure Elucidation with Reasoning-Capable LLMs

06 May 2025, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

We introduce a novel workflow integrating reasoning-capable language models with specialized chemical analysis tools to enhance molecular structure determination using nuclear magnetic resonance spectroscopy. Generally, structure elucidation involves generating candidate molecular structures, comparing their predicted spectral features to experimental data, and identifying the best-fitting structure. Our workflow systematically generates diverse molecular candidates through chemical synthesis predictions, regioisomer exploration, and direct spectral-based methods. The language model bridges the gap between quantitative data and chemical insight by evaluating candidates through a reasoning process that analyzes spectral evidence, explains discrepancies, and assesses overall structural plausibility, moving beyond simple numerical error. This LLM-driven reasoning stage proved crucial, increasing correct top-ranked structure identification accuracy by 26.4%. Simulated spectral data with introduced noise artifacts and solvent peaks further highlighted the robustness of our method, showing accuracy improvements by 35.3%. The language model's confidence scores effectively correlated with prediction accuracy, facilitating efficient triage of results. While currently focused on HSQC data, this framework offers a flexible foundation for next-generation structure elucidation tools combining chemical expertise with advanced reasoning capabilities.

Keywords

Molecular Structure Elucidation
NMR Spectroscopy
Large Language Models (LLMs)
Computer-Assisted Structure Elucidation (CASE)
Explainable AI (XAI)
HSQC Spectroscopy

Supplementary weblinks

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.