Towards more reproducible and FAIRer research data: documenting provenance during data acquisition using the Infofile format

22 November 2022, Version 1
This content is a preprint and has not undergone peer review at the time of posting.


Information, i.e. data, is regarded as the new oil in the 21st century. The impact of this statement from economics for science and the research community is reflected in the hugely increasing number of machine-learning and artificial intelligence applications that were one driving force behind writing out the FAIR principles. However, any form of data (re)use requires the provenance of the data to be recorded. Hence, recording metadata during data acquisition is both, an essential aspect of and as old as science itself. Here, we discuss the why, when, what, and how of research data documentation and present a simple textual file format termed Infofile developed for this purpose. This format allows researchers in the lab to record all relevant metadata during data acquisition in a user-friendly and obvious way while minimising any external dependencies. The resulting machine-actionable metadata in turn allow processing and analysis software to access relevant information, besides making the research data more reproducible and FAIRer. By demonstrating a simple, yet powerful and proven solution to the problem of metadata recording during data acquisition, we anticipate the Infofile format and its underlying principles to have great impact on the reproducibility and hence quality of science, particularly in the field of "little science" lacking established and well-developed software toolchains and standards.


reproducible research
research data management
provenance metadata
FAIR data
knowledge management
electronic laboratory notebook

Supplementary materials

Supporting information
Specification of the infofile format; examples for differ- ent spectroscopic methods.


Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.