A general small language model (SLM) approach to examining scientific trends through conference proceedings: application to the 2019 and 2024 annual meetings of the Brazilian Chemical Society

25 June 2025, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Large Language Models (LLMs) are a machine learning technique that has transformed natural language processing. However, their large computational demands limit their accessibility, leading to the development of Small Language Models (SLMs), which, by running locally on a microcomputer, made AI-driven language processing and enhanced control for text analysis widely accessible. In this work, we use an SLM to analyze the evolution of Chemistry in Brazil by comparing data from the 2019 and 2024 Brazilian Chemical Society meetings (RASBQ). These annual meetings have been gathering over 2,000 researchers of all levels since 1978. We demonstrate the viability of SLMs for extracting and structuring large volumes of text from scientific events collected in annals or books of abstracts, thus enabling comprehensive comparative analyses that would otherwise be impractical. Our methodology extracts abstracts from the RASBQ digital proceedings and processes them using SLMs. These models converted the textual content into structured, manipulable data, thus enabling us to conduct a semantic and statistical analysis of the two events. The results highlight how SLMs can efficiently transform unstructured scientific proceedings into tractable data, thereby saving significant time and resources. The comparison between the 2019 and 2024 events revealed notable changes in thematic distribution, institutional participation, and potential regional impacts, underscoring the importance of data standardization in automated analyses. This work ultimately reinforces the growing role of language models as powerful allies in scientific production and analysis, especially when used critically and supported by consistent statistical methods.

Keywords

Small Language Models (SLM)
Natural Language Processing
Scientific Text Analysis
Chemistry in Brazil

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.