Abstract
The ability to quantify individual components of complex mixtures is a challenge found throughout the life and physical sciences. An improved capacity to generate large datasets along with the uptake of machine-learning (ML) based analysis tools has allowed for various ‘omics’ disciplines to realize exceptional advances. Other areas of chemistry that deal with complex mixtures often cannot leverage these advances. Environmental samples, for example, can be more difficult to access and the resulting small datasets are less appropriate for unconstrained ML approaches. Herein, we present an approach to address this latter issue. Using a very small environmental dataset—35 high-resolution mass spectra gathered from various solvent extractions of Canadian petroleum fractions—we show that the application of specific domain knowledge can lead to ML models with notable performance.
Supplementary materials
Title
Supporting Information
Description
General methods, materials, details and all performed extraction conditions. Additional figures that show the absolute difference after extraction (Fig. 2) or ML performance (Fig. 5) for all extractions (PDF).
Actions
Supplementary weblinks
Title
GitHub page for this paper
Description
The public GitHub page corresponding to this paper. Contains all code relevant to the paper, as well as the raw data and a selection of trained ML models.
Actions
View