UmetaFlow: An untargeted metabolomics workflow for high-throughput data processing and analysis

19 October 2022, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Metabolomics experiments generate highly complex datasets, which are time and work-intensive, sometimes even error-prone if inspected manually. Therefore, new methods for automated, fast, reproducible, and accurate data processing and dereplication are required. Here, we present UmetaFlow, a computational workflow for untargeted metabolomics that combines algorithms for data pre-processing, spectral matching, molecular formula and structural predictions, and an integration to the GNPS workflows Feature-Based Molecular Networking and Ion Identity Molecular Networking for downstream analysis. UmetaFlow is implemented as a Snakemake workflow, making it easy to use, scalable, and reproducible. For more interactive computing, visualization, as well as development, the workflow is also implemented in Jupyter notebooks using the Python programming language and a set of Python bindings to the OpenMS algorithms (pyOpenMS). UmetaFlow was validated with in-house LC-MS/MS datasets of actinomycetes producing known secondary metabolites, as well as commercial standards, and it detected all expected features and accurately annotated 76% of the molecular formulas and 65% of the structures. As a more generic validation, the publicly available MTBLS733 dataset was used for benchmarking, and UmetaFlow detected 797 out of the 836 (~95.3%) ground truth features and accurately quantified 94.9% of these. We anticipate that UmetaFlow will provide a universal platform for the interpretation of large metabolomics datasets.

Keywords

Untargeted
Metabolomics
Processing
High-throughput
Analysis
Workflow
Software

Supplementary materials

Title
Description
Actions
Title
Additional File 1.
Description
Figure S1. A detailed overview of UmetaFlow. Table S1. The optimal parameters for OpenMS (UmetaFlow) for feature detection, formula, and structural predictions of the in-hous datasets. Table S2. Feature detection, structural and formula predictions for pyracrimycin A in Streptomyces sp. NBC 00162, Streptomyces sp. CA-210063 and Streptomyces eridani. Table S3. The optimal parameters for OpenMS (UmetaFlow) for feature detection, quantification, and marker selection of the MTBLS733 QE HF dataset.
Actions
Title
Additional File 2.
Description
SI_Table_S4: All the raw in-house data were both manually analyzed and through UmetaFlow for method validation.
Actions
Title
Additional File 3.
Description
SI_Table_S5: Feature detection, structural and formula predictions for commercial standards germicidins A and B, kanamycin, tetracycline hydrochloride, thiostreptone, globomycin, ampicillin and apramycin.
Actions
Title
Additional File 4.
Description
SI_Table_S6: Feature detection, structural and formula predictions for kirromycin and desferrioxamine B from extracts of Streptomyces collinus Tü 365 and epemicins A and B from extracts of Kutzneria sp. CA-103260.
Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.