Abstract
Quantification is a challenge for non-targeted analysis (NTA) with liquid chromatography–high resolution mass spectrometry (LC–HRMS), due to the lack of analytical standards. Quantification via structure-based predicted ionization efficiency (IE) was found to provide the highest accuracy in estimating concentration. However, achieving confident analyte identification is a challenging task, as multiple candidate structures may be likely. This uncertainty in identification limits the reliability of structure-based IE prediction models, since quantification can be severely compromised in cases of wrongly (tentatively) identified chemicals or lack of candidate structures. Here we investigate the possibility of using cumulative neutral losses from fragmentation spectra (i.e. MS2) to predict the logIE. The first model was based on molecular fingerprints and was applied on structurally identified analytes. PubChem fingerprints performed the best with the root-mean-square error (RMSE) of 0.72 logIE units for the test set. The second model was based on the MS2 spectrum, expressed as cumulative neutral losses was used for model training. This approach is applicable to analytes with unknown structures and showed promising results with RMSE of 0.79 logIE units for the test set. The prediction models were compiled in a Julia package, which is publicly available on GitHub, and may be used as part of a quantification workflow to estimate concentrations of identified and unidentified compounds in NTA.
Supplementary materials
Title
Supporting Information
Description
The Supporting Information contains: Histogram of the pH distribution in the IE dataset (Figure S1); detailed information on the hyperparameter optimization results (Table S1); residual plots of the final fingerprint and CNL model (Figure S2 and S3); list of compounds with the ten highest prediction errors for the FP and CNL model (Table S2); and additional details and plots of the investigation on the correlation of the charge delocalisation and MW with the prediction error of the FP model (Section S4, Figure S4 and S5).
Actions