Leveraging Post-Pretrained LLMs for Inverse Engineering High-Capacity Hydrogen-Storage Metal-Organic Frameworks: From Virtual Structures to Synthesized Materials

02 June 2025, Version 2
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Metal-organic frameworks (MOFs) exhibit significant promise for hydrogen storage due to their high specific surface area, tunable pore structures, and diverse topologies. While machine learning (ML)-assisted inverse design has enhanced the screening of MOFs, the experimental realization of high-performance candidates from extensive hypothetical libraries remains a critical challenge. This study introduces an inverse design strategy utilizing a domain-specific large language model, termed MOFs-LLM, to bridge the gap between virtual structures and experimental synthesis. MOFs-LLM is constructed from a comprehensive materials science corpus comprising 210 million tokens, integrating over 15,000 MOF structures and detailed natural language descriptions extracted from more than 6,000 publications. In contrast to models reliant on building block names, MOFs-LLM demonstrates a 46.7% improvement in capturing structure–property relationships. Following supervised fine-tuning on over 8,000 question-answer pairs addressing hydrogen storage mechanisms, materials, and chemical reasoning, MOFs-LLM generates 60 highly viable structural candidates under the dual constraints of performance and synthetic accessibility. Evaluation using SAscore indicates a 15.6% reduction in synthetic complexity compared to cluster-based screening methods. Guided by MOFs-LLM strategies that involve solvent modulation and tuning of Cu⁺/Cu²⁺ oxidation states, a novel MOF (Cu-LLMs-1) was synthesized through just three experimental iterations, achieving a room-temperature hydrogen uptake of 1.33 wt%, placing it among the top five reported pure MOFs. The results highlight the potential of MOFs-LLM to bridge virtual design with experimental realization, facilitating the intelligent discovery and synthesis of high-performance hydrogen storage materials.

Keywords

Large language model
Metal-organic frameworks
Hydrogen storage
Reverse design
Post-Pretraining

Supplementary materials

Title
Description
Actions
Title
Supporting information
Description
Data examples, methods, and result analysis ICONS used for training the model
Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.