A Novel Chemical Space Dependent Strategy for Compound Selection in Non-Target LC–HRMS Method Development Using Physicochemical and Structural Data

20 May 2025, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

The virtual chemical space of substances, including emerging contaminants relevant to the environment and exposome, is in continuous expansion. The comprehensive measure of specific regions of chemical (sub)space relies on non-targeted analysis (NTA) using liquid chromatography high-resolution mass spectrometry (LC–HRMS). Internal standards are generally used to balance selectivity and sensitivity during NTA sample preparation and method development, assuming linearity between structure and behavior of all chemicals of interest. Since this assumption does not hold for wide and complex chemical subspaces, the actual measured space by NTA methods is consequently narrowed to a reduced set of chemically/structurally similar compounds. To understand the true coverage of the measurable chemical space by NTA methods, we present a data-driven strategy enabling the unbiased sampling of structures for LC–HRMS method development from a vast chemical subspace of interest (e.g., US-EPA CompTox >1 million chemicals). This workflow mines candidate structures maximizing their physicochemical diversity for reliable chemical subspace coverage. Measurable compound lists (MCLs) are effectively sampled using precomputed PubChem physicochemical properties (e.g., molecular weight and XLogP) and predicted mobility and ionization efficiency from molecular fingerprints. This approach significantly advances the selection of heterogeneous structures compatible with LC–HRMS analysis for NTA method development, validation, and method space boundary assessment while preserving the original diversity of the selected chemical space. The MCL sampled from CompTox space exhibited a greater chemical coverage and a broader predicted LC–HRMS applicability compared to common “watch list” contaminants.

Keywords

Non-target analysis
Chemical Space
Exposomics
Emerging Contaminants
Mobility
Ionization Efficiency
Liquid Chromatography
Mass Spectrometry

Supplementary materials

Title
Description
Actions
Title
Supplementary Materials
Description
Contents S1. Dataset creation S1.1 CompTox database and fingerprints calculation S1.2 PubChem descriptors and EMD calculations S1.3 Mobility and ionization efficiency predictions S2. Principal component analysis of the CompTox dataset S2.1 Principal component analysis outputs S2.2 Symmetric gridding S3. Chemicals of European monitoring lists S4. MCL selection & validation S4.1 Chemical coverage of sampled MCLs S4.2 Standard availability of MCL candidates
Actions

Supplementary weblinks

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.