Universal Workflow Language and Software Enables Geometric Learning and FAIR Scientific Protocol Reporting

11 September 2024, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

The modern technological landscape has trended towards increased precision and greater digitization of information. However, the methods used to record and communicate scientific procedures have remained largely unchanged over the last century. Written text as the primary means for communicating scientific protocols poses notable limitations in human and machine information transfer. Therefore, successful replication and analysis of experimental protocols necessitates a new approach for high-fidelity communication. In this work, we present the Universal Workflow Language (UWL) and the open-source Universal Workflow Language interface (UWLi). UWL is a graph-based data architecture that can capture arbitrary scientific procedures through workflow representation of protocol steps and embedded procedure metadata. It is machine readable, discipline agnostic, and compatible with Findable, Accessible, Interoperable, and Reusable reporting standards. UWLi is an accompanying software package for building and manipulating UWL files into tabular and plain text representations in a controlled, detailed, and multilingual environment. UWL transcription of protocols from three high-impact publications resulted in the identification of substantial deficiencies in the detail of the reported procedures. UWL transcription of these publications identified seventeen procedural ambiguities and thirty missing parameters for every one hundred words in published procedures. In addition to preventing and identifying procedural omission, UWL files were found to be compatible with geometric learning techniques for representing scientific protocols. In a surrogate function designed to represent an arbitrary multi-step experimental process, graph transformer networks were able to predict outcomes in approximately 6,000 fewer experiments than equivalent linear models. Implementation of UWL and UWLi into the scientific reporting process will result in higher reproducibility between both experimentalists and machines, thus proving an avenue to more effective modeling and control of complex systems.

Keywords

Process data
Reproducibility
Machine learning
Graph learning
Software

Supplementary weblinks

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.