Abstract
Metal-organic frameworks (MOFs) exhibit significant promise for hydrogen storage due to their high specific surface area, tunable pore structures, and diverse topologies. While machine learning (ML)-assisted inverse design has enhanced the screening of MOFs, the experimental realization of high-performance candidates from extensive hypothetical libraries remains a critical challenge. This study introduces an inverse design strategy utilizing a domain-specific large language model, termed MOFs-LLM, to bridge the gap between virtual structures and experimental synthesis. MOFs-LLM is constructed from a comprehensive materials science corpus comprising 210 million tokens, integrating over 15,000 MOF structures and detailed natural language descriptions extracted from more than 6,000 publications. In contrast to models reliant on building block names, MOFs-LLM demonstrates a 46.7% improvement in capturing structure–property relationships. Following supervised fine-tuning on over 8,000 question-answer pairs addressing hydrogen storage mechanisms, materials, and chemical reasoning, MOFs-LLM generates 60 highly viable structural candidates under the dual constraints of performance and synthetic accessibility. Evaluation using SAscore indicates a 15.6% reduction in synthetic complexity compared to cluster-based screening methods. Guided by MOFs-LLM strategies that involve solvent modulation and tuning of Cu⁺/Cu²⁺ oxidation states, a novel MOF (Cu-LLMs-1) was synthesized through just three experimental iterations, achieving a room-temperature hydrogen uptake of 1.33 wt%, placing it among the top five reported pure MOFs. The results highlight the potential of MOFs-LLM to bridge virtual design with experimental realization, facilitating the intelligent discovery and synthesis of high-performance hydrogen storage materials.
Supplementary materials
Title
Supporting information
Description
Data examples, methods, and result analysis ICONS used for training the model
Actions