Abstract
Data digitisation of scientific literature is essential to expedite the creation of machine-learnable knowledge bases for data-driven research and integration with knowledge-intensive systems like self-driving laboratories. However, automating the extraction, interpretation, and the structuring of data from information-rich graphical elements within the prevalent PDF format remains a significant challenge. We present MERMaid (Multimodal aid for Reaction Mining), an end-to-end knowledge ingestion pipeline to automatically convert disparate information conveyed through figures and tables across various PDFs into a coherent and machine-actionable knowledge graph. By leveraging the emergent visual cognition and reasoning capabilities of vision-language models, MERMaid demonstrates chemical context awareness, self-directed context completion, and robust coreference resolution to achieve 87% end-to-end overall accuracy. Notably, MERMaid is topic-agnostic and adaptable to various chemical domains. Its modular design and extensibility facilitate future application to diverse scientific data beyond reaction mining, promising to unlock the full potential of scientific literature for knowledge-intensive applications.
Supplementary materials
Title
Supporting Information
Description
Includes supplementary notes on model responses, error analyses, full DOI lists for evaluation datasets, and additional performance statistics
Actions