Machine Learning in Complex Organic Mixtures: Applying Domain Knowledge Allows for Meaningful Performance with Small Datasets.

18 April 2024, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

The ability to quantify individual components of complex mixtures is a challenge found throughout the life and physical sciences. An improved capacity to generate large datasets along with the uptake of machine-learning (ML) based analysis tools has allowed for various ‘omics’ disciplines to realize exceptional advances. Other areas of chemistry that deal with complex mixtures often cannot leverage these advances. Environmental samples, for example, can be more difficult to access and the resulting small datasets are less appropriate for unconstrained ML approaches. Herein, we present an approach to address this latter issue. Using a very small environmental dataset—35 high-resolution mass spectra gathered from various solvent extractions of Canadian petroleum fractions—we show that the application of specific domain knowledge can lead to ML models with notable performance.

Keywords

Machine Learning
Domain Knowledge
Petroleomics
High-Resolution Mass Spectrometry

Supplementary materials

Title
Description
Actions
Title
Supporting Information
Description
General methods, materials, details and all performed extraction conditions. Additional figures that show the absolute difference after extraction (Fig. 2) or ML performance (Fig. 5) for all extractions (PDF).
Actions

Supplementary weblinks

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.