Annotating Materials Science Text: A Semi-Automated Approach for Crafting Outputs with Gemini Pro

Hasan M Sayeed; Trupti Mohanty; Taylor Sparks

doi:10.26434/chemrxiv-2024-173dp

Recent advancements in large language models (LLMs) have paved the way for automated information extraction in the materials science domain. However, fine-tuning these models, crucial for effective machine learning pipelines in materials science, is hindered by a lack of pre-annotated data. Manual annotation, a laborious process, exacerbates the challenge. To address this, we introduce a tailored semi-automated annotation process, using Google's Gemini Pro language model. Our approach focuses on two key tasks: extracting information in structured JSON format and generating abstractive summaries from materials science texts. The collaborative process, a symbiotic effort between human annotators and the LLM, driven by structured prompts and user-guided examples, enhances the annotation quality and augments the LLM's capacity to comprehend materials science intricacies. Importantly, it streamlines human annotation efforts by leveraging the LLM's proficient starting point.

Annotating Materials Science Text: A Semi-Automated Approach for Crafting Outputs with Gemini Pro

Abstract

Keywords

Comments

Version History

Metrics

License

DOI

Funding

Author’s competing interest statement

Ethics

Share

Annotating Materials Science Text: A Semi-Automated Approach for Crafting Outputs with Gemini Pro

Authors

Abstract

Keywords

Comments

Version History

Metrics

License

DOI

Funding

Author’s competing interest statement

Ethics

Share