Abstract
This study presents a comprehensive evaluation of reasoning-enhanced large language models (LLMs) — DeepSeek R1 and OpenAI o4-mini — across six critical chemical tasks. By benchmarking these models against conventional LLMs and established computa-tional tools, we systematically investigate the impact of reasoning capabilities on chemical cognition. Experimental results demon-strate that reasoning-enabled LLMs achieve significant performance improvements in foundational tasks, with DeepSeek R1 attain-ing 88.88% accuracy in SMILES-to-name conversions and 58% accuracy in point group identification, outperforming both OpenAI o4-mini (81.48% and 26%, respectively) and legacy models. However, domain-specific limitations persist: both models exhibit structural inaccuracies in CIF file generation (e.g., erroneous atomic connectivity) and struggle with ordered pattern synthesis in SEM simulations. Notably, while reasoning frameworks enhance logical coherence, they do not inherently resolve challenges in stereochemical assignments or rare symmetry group recognition. These findings underscore the necessity for domain-optimized training paradigms to bridge the gap between generic reasoning capabilities and specialized chemical applications.
Supplementary materials
Title
Supplementary materials
Description
This document provides a directory of additional materials, as well as the prompts used in the process of generating SEM and the obtained original images.
Actions
Title
The crystal structure file generated by the Large language model
Description
The crystal structure file generated by the Large language model
<IMPORTANT DISCLAIMER>: All Crystallographic Information Files (.cif) within this directory are algorithmically generated by Large Language Models (LLMs) and do not represent experimental data.
Complete details are provided in the WARNING.txt file within the compressed archive.
Actions
Title
Additional data(S1-S4)
Description
This document provides detailed data not mentioned in the main text for each evaluation task, corresponding to S1-S4 in the additional materials.
Actions