Cumulative Neutral Loss Model for Fragment Deconvolution in Electrospray Ionization High-Resolution Mass Spectrometry Data

01 March 2023, Version 1
This content is a preprint and has not undergone peer review at the time of posting.


Fragment deconvolution is a crucial step during componentization of non-targeted analysis (NTA) high-resolution mass spectrometry (HRMS) data, aiming to filter out false positive (FP) signals that do not belong to the component. Moreover, inclusion of FP fragments could lead to, for example, wrong identification further down the workflow. Commonly used methods for deconvolution of fragment signals rely on the presence of a time domain (e.g., peak apex retention time difference and correlation analysis). However, when there is no or insufficient MS2 information in the time domain, these methods are unusable and only the mass domain remains. A probability based cumulative neutral loss (CNL) model for fragment deconvolution using the mass domain information was thus developed to allow deconvolution for such cases. The optimized model, with a mass tolerance of 0.005 Da and a CNL score threshold of -0.95, was able to achieve true positive rate (TPr) of 95.0%, a false discovery rate (FDr) of 25.6%, and a reduction rate of 39.9%. Additionally, the CNL model was extensively tested on real samples containing predominantly pesticides at different concentration levels and with matrix effects. Overall, the model was able to obtain a TPr above 95% with FD rates between 45% and 77% and reduction rates between 10% and 24%. Finally, the CNL model was compared with the retention time difference method and peak shape correlation analysis, showing that a combination of correlation analysis and the CNL model was the most effective for fragment deconvolution, obtaining a TPr of 93.1%, a FDr of 57.2%, and a reduction rate of 42.6%.


Fragment deconvolution
Bayesian statistics
Machine learning
Non-targeted analysis
High-resolution mass spectrometry

Supplementary materials

Supporting information
Overview of reference compounds and their corresponding sample, ROCs for the performance assessment of the CNL model using both the database and measured fragments, overview of high probability CNLs, and case figures for TP, FN, FP, and TN detected fragments.

Supplementary weblinks


Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.