Abstract
Marine systems are incredibly chemically complex. An understanding of the chemical compounds that make up the chemical diversity in these samples is critical to understanding ecological and ocean health metrics. Using Raman spectroscopy in tandem with machine learning combines a low-cost highly transportable analytical technique with a powerful and rapid computational approach that can aid in marine analysis. Here we use Raman and machine learning to identify concentrations of three chemically relevant compounds in three distinct classes. Saccharides are represented by glucose, fatty acids by butyric acid, and proteins are represented by amino acid proxy through glycine. Eight machine learning models (gradient boosted regressors, random forests, histogram gradient boosted regressors, decision trees, k nearest neighbors, support vector regression, multilayer perceptrons, and multivariate linear regression) were tested for their accuracy in identifying the concentrations of glycine, glucose, and butyric acid in marine samples. Support vector regression was able to best identify all three concentrations of glycine, butyric acid, and glucose. Butyric acid was similarly well described through gradient boosted regression and histogram gradient boosted regression. In this work Raman, though it has a lower sensitivity than mass spectrometry, can still be used to identify mM concentrations of organics in complex aqueous matrix. The described methodology has the potential to significantly advance rapid field analysis of marine samples.
Supplementary materials
Title
Supplemental Information
Description
Supplemental figures and information for manuscript.
Actions