A general small language model (SLM) approach to examining scientific trends through conference proceedings: application to the 2019 and 2024 annual meetings of the Brazilian Chemical Society

Rubens Souza; Nathalia Rosa; Julio Duarte; Itamar Borges Jr

doi:10.26434/chemrxiv-2025-vjqhg

Chemical Education

Search within Chemical Education

A general small language model (SLM) approach to examining scientific trends through conference proceedings: application to the 2019 and 2024 annual meetings of the Brazilian Chemical Society

25 June 2025, Version 1

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Large Language Models (LLMs) are a machine learning technique that has transformed natural language processing. However, their large computational demands limit their accessibility, leading to the development of Small Language Models (SLMs), which, by running locally on a microcomputer, made AI-driven language processing and enhanced control for text analysis widely accessible. In this work, we use an SLM to analyze the evolution of Chemistry in Brazil by comparing data from the 2019 and 2024 Brazilian Chemical Society meetings (RASBQ). These annual meetings have been gathering over 2,000 researchers of all levels since 1978. We demonstrate the viability of SLMs for extracting and structuring large volumes of text from scientific events collected in annals or books of abstracts, thus enabling comprehensive comparative analyses that would otherwise be impractical. Our methodology extracts abstracts from the RASBQ digital proceedings and processes them using SLMs. These models converted the textual content into structured, manipulable data, thus enabling us to conduct a semantic and statistical analysis of the two events. The results highlight how SLMs can efficiently transform unstructured scientific proceedings into tractable data, thereby saving significant time and resources. The comparison between the 2019 and 2024 events revealed notable changes in thematic distribution, institutional participation, and potential regional impacts, underscoring the importance of data standardization in automated analyses. This work ultimately reinforces the growing role of language models as powerful allies in scientific production and analysis, especially when used critically and supported by consistent statistical methods.

Keywords

Small Language Models (SLM)

Natural Language Processing

Scientific Text Analysis

Chemistry in Brazil

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Version History

Jun 25, 2025 Version 1

Metrics

Views

Downloads

Citations

License

The content is available under CC BY NC ND 4.0

DOI

10.26434/chemrxiv-2025-vjqhg

Funding

Conselho Nacional de Desenvolvimento Científico e Tecnológico

300281/2025-0

Fundação Carlos Chagas Filho de Amparo à Pesquisa do Estado do Rio de Janeiro

E-26/204.294/2024 and E-26/205.922/2022

Author’s competing interest statement

The author(s) have declared they have no conflict of interest with regard to this content

Ethics

The author(s) have declared ethics committee/IRB approval is not relevant to this content

A general small language model (SLM) approach to examining scientific trends through conference proceedings: application to the 2019 and 2024 annual meetings of the Brazilian Chemical Society

Authors

Abstract

Keywords

Comments

Version History

Metrics

License

DOI

Funding

Author’s competing interest statement

Ethics

Share