Can reasoning power significantly improve the knowledge of large language models for chemistry? --Based on conversations with Deepseek and ChatGPT

22 May 2025, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

This study presents a comprehensive evaluation of reasoning-enhanced large language models (LLMs) — DeepSeek R1 and OpenAI o4-mini — across six critical chemical tasks. By benchmarking these models against conventional LLMs and established computa-tional tools, we systematically investigate the impact of reasoning capabilities on chemical cognition. Experimental results demon-strate that reasoning-enabled LLMs achieve significant performance improvements in foundational tasks, with DeepSeek R1 attain-ing 88.88% accuracy in SMILES-to-name conversions and 58% accuracy in point group identification, outperforming both OpenAI o4-mini (81.48% and 26%, respectively) and legacy models. However, domain-specific limitations persist: both models exhibit structural inaccuracies in CIF file generation (e.g., erroneous atomic connectivity) and struggle with ordered pattern synthesis in SEM simulations. Notably, while reasoning frameworks enhance logical coherence, they do not inherently resolve challenges in stereochemical assignments or rare symmetry group recognition. These findings underscore the necessity for domain-optimized training paradigms to bridge the gap between generic reasoning capabilities and specialized chemical applications.

Keywords

Chemoinformatics
Reasoning LLM
Deepseek
ChatGPT
Chemical Tasks

Supplementary materials

Title
Description
Actions
Title
Supplementary materials
Description
S1. Detailed output results of SMILES code and chemical name conversion process S2. Detailed output results of logP S3. Prompts and corresponding raw images utilized during the image generation procedure
Actions
Title
The crystal structure file generated by the Large language model
Description
The crystal structure file generated by the Large language model ******IMPORTANT DISCLAIMER****** All Crystallographic Information Files (.cif) within this directory are algorithmically generated by Large Language Models (LLMs) and do not represent experimental data. Complete details are provided in the WARNING.txt file within the compressed archive.
Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.