Naive Bayes classification model for isotopologue detection in LC-HRMS data



Isotopologue identification or removal is a necessary step to reduce the number of features that need to be identified in samples analyzed with non-targeted analysis. Currently available approaches rely on either predicted isotopic patterns or an arbitrary mass tolerance, requiring information on the molecular formula or instrumental error, respectively. Therefore, a Naive Bayes isotopologue classification model was developed that does not depend on any thresholds or molecular formula information. This classification model uses elemental mass defects of six elemental ratios and can successfully identify isotopologues in both theoretical isotopic patterns and wastewater influent samples, outperforming one of the most commonly used approaches (i.e., 1.0033 Da mass difference method - CAMERA).


Supplementary material

Supporting information for: Naive Bayes classi fication model for isotopologue detection in LC-HRMS data
Information on the presence of the elemental ratios for the chemicals in the DDS-Tox database, an overview of correlation coefficients for the different elemental ratios between the EMDmono and EMDiso values with scatter plots for the two most extreme correlations, receiver operator curves for the classification model and mass difference method used for the selection of the scoreEMD, a reference compound list used for the performance assessment of the classification model and mass difference method on wastewater influent samples, and an example of FP detected isotopologue for the classification model.