Abstract
Untargeted metabolomics is evolving into a field of big data science. There is a growing interest within the metabolomics community in mining MS/MS-based data from public repositories. The theme of this protocol, reverse metabolomics, is a data science strategy that differs from the traditional LC-MS/MS-based untargeted metabolomics approach. In traditional untargeted metabolomics, we first collect the samples to address a predefined question and then collect LC-MS/MS data. We then identify metabolites associated with a phenotype (e.g., disease vs. healthy), and elucidate or validate their structural details (e.g., molecular formula, structural classification, substructure, or complete structural annotation or identification). Reverse metabolomics, however, does not necessarily involve collecting new data or requiring the structural characterization of molecules. Instead, we start with MS/MS spectra for known or unknown molecules and discover phenotype-relevant information such as organ/biofluid distribution, disease condition, intervention status (e.g., pre- and post-intervention), organisms (e.g., mammals vs. others), geography, and any other biologically relevant associations available in public repositories. This protocol guides the reader through the step-by-step process of utilizing available MS/MS data and discovering repository-scale associations of the associated MS/MS spectra. As example, we utilize MS/MS spectra from three small molecules: phenylalanine-cholic acid (a microbially conjugated bile acid), phenylalanine-C4:0, and histidine-C4:0 (two N-acyl amides). We leverage the GNPS-based framework to explore the microbial producers of these molecules and their associations with health conditions and organ distributions in humans and rodents.
Supplementary materials
Title
Visual Cheatsheet guide
Description
Explication of the script through screenshots.
Actions