Ionization Efficiency Prediction of Electrospray Ionization Mass Spectrometry Analytes based on Molecular Fingerprints and Cumulative Neutral Losses

13 June 2025, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Quantification is a challenge for non-targeted analysis (NTA) with liquid chromatography–high resolution mass spectrometry (LC–HRMS), due to the lack of analytical standards. Quantification via structure-based predicted ionization efficiency (IE) was found to provide the highest accuracy in estimating concentration. However, achieving confident analyte identification is a challenging task, as multiple candidate structures may be likely. This uncertainty in identification limits the reliability of structure-based IE prediction models, since quantification can be severely compromised in cases of wrongly (tentatively) identified chemicals or lack of candidate structures. Here we investigate the possibility of using cumulative neutral losses from fragmentation spectra (i.e. MS2) to predict the logIE. The first model was based on molecular fingerprints and was applied on structurally identified analytes. PubChem fingerprints performed the best with the root-mean-square error (RMSE) of 0.72 logIE units for the test set. The second model was based on the MS2 spectrum, expressed as cumulative neutral losses was used for model training. This approach is applicable to analytes with unknown structures and showed promising results with RMSE of 0.79 logIE units for the test set. The prediction models were compiled in a Julia package, which is publicly available on GitHub, and may be used as part of a quantification workflow to estimate concentrations of identified and unidentified compounds in NTA.

Keywords

Ionization efficeincy
HRMS
NTS/NTA
Machine learning
semi-quantification

Supplementary materials

Title
Description
Actions
Title
Supporting Information
Description
The Supporting Information contains: Histogram of the pH distribution in the IE dataset (Figure S1); detailed information on the hyperparameter optimization results (Table S1); residual plots of the final fingerprint and CNL model (Figure S2 and S3); list of compounds with the ten highest prediction errors for the FP and CNL model (Table S2); and additional details and plots of the investigation on the correlation of the charge delocalisation and MW with the prediction error of the FP model (Section S4, Figure S4 and S5).
Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.