MOF-ChemUnity: Unifying metal-organic framework data using large language models

03 June 2025, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Artificial intelligence (AI) is transforming materials research in metal-organic frameworks (MOFs), where models trained on structured computational data routinely predict new materials and optimize their properties. This raises a central question: What if we could leverage the full breadth of MOF knowledge, not just structured datasets, but also the scientific literature? For human researchers, the literature remains the primary source of knowledge, yet much of its content, including experimental data and expert insight, remains underutilized by AI systems. We introduce MOF-ChemUnity, a structured, extensible, and scalable knowledge graph that unifies MOF chemical data by linking literature-derived insights to crystal structures and computational datasets. By disambiguating MOF names in the literature and connecting them to crystal structures in the Cambridge Structural Database, MOF-ChemUnity unifies experimental and computational sources and enables cross-document knowledge extraction and linking. We showcase how this enables multi-property machine learning across simulated and experimental data, compilation of complete synthesis records for individual compounds by aggregating information across multiple publications, and expert-guided materials recommendations via structure-based embeddings. When used as a knowledge source to augment large language models (LLMs), MOF-ChemUnity enables a literature-informed AI assistant that operates over the full scope of MOF knowledge. Expert evaluations show improved accuracy, interpretability, and trustworthiness across tasks such as retrieval, inference of structure-property relationships, and materials recommendation, outperforming standard LLMs. This work lays the foundation for literature-informed materials discovery, enabling both human scientists and AI systems to reason over the full landscape of MOF knowledge in a new way.

Keywords

Artificial Intelligence
Machine Learning
metal-organic frameworks
MOFs
chemical data
LLMs
large language models

Supplementary materials

Title
Description
Actions
Title
Supplementary Information
Description
Supplementary Information and Materials
Actions

Supplementary weblinks

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.