Abstract
The virtual chemical space of substances, including emerging contaminants relevant to the environment and exposome, is in continuous expansion. The comprehensive measure of specific regions of chemical (sub)space relies on non-targeted analysis (NTA) using liquid chromatography high-resolution mass spectrometry (LC–HRMS). Internal standards are generally used to balance selectivity and sensitivity during NTA sample preparation and method development, assuming linearity between structure and behavior of all chemicals of interest. Since this assumption does not hold for wide and complex chemical subspaces, the actual measured space by NTA methods is consequently narrowed to a reduced set of chemically/structurally similar compounds. To understand the true coverage of the measurable chemical space by NTA methods, we present a data-driven strategy enabling the unbiased sampling of structures for LC–HRMS method development from a vast chemical subspace of interest (e.g., US-EPA CompTox >1 million chemicals). This workflow mines candidate structures maximizing their physicochemical diversity for reliable chemical subspace coverage. Measurable compound lists (MCLs) are effectively sampled using precomputed PubChem physicochemical properties (e.g., molecular weight and XLogP) and predicted mobility and ionization efficiency from molecular fingerprints. This approach significantly advances the selection of heterogeneous structures compatible with LC–HRMS analysis for NTA method development, validation, and method space boundary assessment while preserving the original diversity of the selected chemical space. The MCL sampled from CompTox space exhibited a greater chemical coverage and a broader predicted LC–HRMS applicability compared to common “watch list” contaminants.
Supplementary materials
Title
Supplementary Materials
Description
Contents
S1. Dataset creation
S1.1 CompTox database and fingerprints calculation
S1.2 PubChem descriptors and EMD calculations
S1.3 Mobility and ionization efficiency predictions
S2. Principal component analysis of the CompTox dataset
S2.1 Principal component analysis outputs
S2.2 Symmetric gridding
S3. Chemicals of European monitoring lists
S4. MCL selection & validation
S4.1 Chemical coverage of sampled MCLs
S4.2 Standard availability of MCL candidates
Actions
Supplementary weblinks
Title
MCL_selection_workflow
Description
This repository contains the code used in the paper: "A Novel Chemical Space Dependent Strategy for Compound Selection in Non-Target LC–HRMS Method Development Using Physicochemical and Structural Data".
Actions
View Title
Supplementary_MCL_dataset
Description
This repository contains the datasets to compute measurable compound lists (MCLs).
Actions
View