Ionization Efficiency Prediction of Electrospray Ionization Mass Spectrometry Analytes based on Molecular Fingerprints and Cumulative Neutral Losses

Alexandros Nikolopoulos; Denice van Herwerden; Viktoriia Turkina; Anneli Kruve; Melissa Baerenfaenger; Saer Samanipour

doi:10.26434/chemrxiv-2025-dc9gd

Analytical Chemistry

Search within Analytical Chemistry

Ionization Efficiency Prediction of Electrospray Ionization Mass Spectrometry Analytes based on Molecular Fingerprints and Cumulative Neutral Losses

13 June 2025, Version 1

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Quantification is a challenge for non-targeted analysis (NTA) with liquid chromatography–high resolution mass spectrometry (LC–HRMS), due to the lack of analytical standards. Quantification via structure-based predicted ionization efficiency (IE) was found to provide the highest accuracy in estimating concentration. However, achieving confident analyte identification is a challenging task, as multiple candidate structures may be likely. This uncertainty in identification limits the reliability of structure-based IE prediction models, since quantification can be severely compromised in cases of wrongly (tentatively) identified chemicals or lack of candidate structures. Here we investigate the possibility of using cumulative neutral losses from fragmentation spectra (i.e. MS2) to predict the logIE. The first model was based on molecular fingerprints and was applied on structurally identified analytes. PubChem fingerprints performed the best with the root-mean-square error (RMSE) of 0.72 logIE units for the test set. The second model was based on the MS2 spectrum, expressed as cumulative neutral losses was used for model training. This approach is applicable to analytes with unknown structures and showed promising results with RMSE of 0.79 logIE units for the test set. The prediction models were compiled in a Julia package, which is publicly available on GitHub, and may be used as part of a quantification workflow to estimate concentrations of identified and unidentified compounds in NTA.

Keywords

Ionization efficeincy

Supplementary materials

Title

Description

Actions

Title

Supporting Information

Description

The Supporting Information contains: Histogram of the pH distribution in the IE dataset (Figure S1); detailed information on the hyperparameter optimization results (Table S1); residual plots of the final fingerprint and CNL model (Figure S2 and S3); list of compounds with the ten highest prediction errors for the FP and CNL model (Table S2); and additional details and plots of the investigation on the correlation of the charge delocalisation and MW with the prediction error of the FP model (Section S4, Figure S4 and S5).

Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Version History

Jun 13, 2025 Version 1

Metrics

216

Views

Downloads

Citations

License

The content is available under CC BY 4.0

DOI

10.26434/chemrxiv-2025-dc9gd

Funding

ChemistryNL

UvA Data Science Center

Author’s competing interest statement

The author(s) have declared they have no conflict of interest with regard to this content

Ethics

The author(s) have declared ethics committee/IRB approval is not relevant to this content

Ionization Efficiency Prediction of Electrospray Ionization Mass Spectrometry Analytes based on Molecular Fingerprints and Cumulative Neutral Losses

Authors

Abstract

Keywords

Supplementary materials

Comments

Version History

Metrics

License

DOI

Funding

Author’s competing interest statement

Ethics

Share