Prioritization of unknown features based on predicted toxicity categories

21 November 2024, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Complex environmental samples contain a diverse array of known and unknown constituents. While Liquid Chromatography coupled with High-Resolution Mass Spectrometry (LC-HRMS) Non-Targeted Analysis (NTA) has emerged as an essential tool for the comprehensive study of such samples, the identification of individual constituents remains a significant challenge, primarily due to the vast number of detected features in each sample. To address this, prioritization strategies are frequently employed to narrow the focus to the most relevant features for further analysis. In this study, we developed a novel prioritization strategy that directly links fragmentation and chromatographic data to aquatic toxicity categories, bypassing the need for individual compound identification. Given that features are not always well-characterized through fragmentation, we created two models: 1) a Random Forest Classification (RFC) model, which classifies fish toxicity categories based on MS1, retention, and fragmentation data---expressed as cumulative neutral losses (CNLs)---when fragmentation information is available, and 2) a Kernel Density Estimation (KDE) model that relies solely on retention time and MS1 data when fragmentation is absent. Both models demonstrated accuracy comparable to structure-based prediction methods. We further tested the models on a pesticide mixture in a tea extract measured by LC-HRMS, where the CNLs-based RFC model achieved 0.76 accuracy and the KDE model reached 0.61, showcasing their robust performance in real-world applications.

Keywords

Non targeted analysis
Machine Learning
Prioritization
LC-HRMS

Supplementary materials

Title
Description
Actions
Title
Supporting information for: Prioritization of unknown features based on predicted toxicity categories
Description
The supporting information describes the acquisition procedure for the pesticide mixture and the results of models validation.
Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.