raMSI for Ground-Truth Machine Learning of Mass Spectrometry Imaging Data

12 July 2022, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

The application of machine learning (ML) in mass spectrometry imaging (MSI) becomes one of the most eye-catching fields due to the unparallel sensitivity and specificity of MS for molecular detection and the unmatched efficiency and accuracy of ML for pattern recognition. To get ML started, binning is the most common method used to preprocess the MSI data for the acquisition of millions of m/z bins. However, after deep mining of this approach, we find it suffers strikingly serious ambiguity problem, which introduces a fundamental question: was the machine “learning” the intricate MS data properly? In this report, we provide a resolution adaptive method, raMSI, which can attain ground truth molecular features for large datasets and is compatible with different data formats from mainstream mass analyzers. Build on raMSI, a ML ecosystem is designed including data collection, data preparation, database construction, explorative data analysis, modeling and biological insights acquisition. We envision this platform serves for the purpose of motivating cross-disciplinary research involving chemistry, statistics, and biology.

Keywords

Mass Spectrometry Imaging
Machine Learning
Ground Truth Molecular Feature
raMSI
Binning

Supplementary weblinks

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.