MERMaid: Universal multimodal mining of chemical reactions from PDFs using vision-language models

07 March 2025, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Data digitisation of scientific literature is essential to expedite the creation of machine-learnable knowledge bases for data-driven research and integration with knowledge-intensive systems like self-driving laboratories. However, automating the extraction, interpretation, and the structuring of data from information-rich graphical elements within the prevalent PDF format remains a significant challenge. We present MERMaid (Multimodal aid for Reaction Mining), an end-to-end knowledge ingestion pipeline to automatically convert disparate information conveyed through figures and tables across various PDFs into a coherent and machine-actionable knowledge graph. By leveraging the emergent visual cognition and reasoning capabilities of vision-language models, MERMaid demonstrates chemical context awareness, self-directed context completion, and robust coreference resolution to achieve 87% end-to-end overall accuracy. Notably, MERMaid is topic-agnostic and adaptable to various chemical domains. Its modular design and extensibility facilitate future application to diverse scientific data beyond reaction mining, promising to unlock the full potential of scientific literature for knowledge-intensive applications.

Keywords

vision-language models
digital chemistry
data mining
electroorganic synthesis
organic synthesis
photocatalysis
knowledge graphs
databases

Supplementary materials

Title
Description
Actions
Title
Supporting Information
Description
Includes supplementary notes on model responses, error analyses, full DOI lists for evaluation datasets, and additional performance statistics
Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.