Evaluating the generalizability of graph neural networks for predicting collision cross section

Chloe Engler Hart; António José Preto; Shaurya Chanana; David  Healey; Tobias Kind; Daniel Domingo-Fernández

doi:10.26434/chemrxiv-2024-32j2t

Theoretical and Computational Chemistry

Search within Theoretical and Computational Chemistry

Evaluating the generalizability of graph neural networks for predicting collision cross section

17 May 2024, Version 1

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Ion Mobility coupled with Mass Spectrometry (IM-MS) is a promising analytical technique that enhances molecular characterization by measuring collision cross-section (CCS) values, which are indicative of the molecular size and shape. However, the effective application of CCS values in structural analysis is still constrained by the limited availability of experimental data, necessitating the development of accurate machine learning (ML) models for in silico predictions. In this study, we evaluated state-of-the-art Graph Neural Networks (GNNs), trained to predict CCS values using the largest publicly available dataset to date. Although our results confirm the high accuracy of these models within chemical spaces similar to their training environments, their performance significantly declines when applied to structurally novel regions. This discrepancy raises concerns about the reliability of in silico CCS predictions and underscores the need for releasing further publicly available CCS datasets. To mitigate this, we demonstrate how generalization can be partially improved by extending models to account for additional features such as molecular fingerprints, descriptors, and the molecule types. Lastly, we also show how confidence models can support by enhancing the reliability of the CCS estimates.

Keywords

Collision cross section

Graph neural networks

Machine learning

Mass spectrometry

Supplementary materials

Title

Description

Actions

Title

Supplementary Text, Figures, and Tables associated with the manuscript.

Description

Supplementary Figure 1. Highest Tanimoto similarities for the most similar compound in CCSBase v1.3 (A) and METLIN-CCS (B). (C) and (D) are equivalent distributions but applying a filter for each on molecules that share the same Murcko scaffold (equivalent to applying a Murcko scaffold split) Supplementary Figure 2. Data splitting strategy. Supplementary Figure 3. Overlap of the 100 predictions with largest deviation from the original values for the three models when training and evaluating on CCSBase. Supplementary Figure 4. Representation of METLIN-CCS (yellow) and CCSBase (blue) by reducing molecular fingerprints into two dimensions using the t-SNE dimensionality reduction method (t-distributed stochastic neighbor embedding). Supplementary Figure 5. Highest Tanimoto similarities for the most similar compound between the two databases (CCSBase and METLIN-CCS).

Actions

Supplementary weblinks

Title

Description

Actions

Title

Scripts and data

Description

Source code

Actions

View

Title

Predictions

Description

Benchmark predictions

Actions

View

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Version History

May 17, 2024 Version 1

Metrics

599

314

Views

Downloads

Citations

License

The content is available under CC BY NC 4.0

DOI

10.26434/chemrxiv-2024-32j2t

Author’s competing interest statement

All authors were employees of Enveda Biosciences Inc. during the course of this work and have real or potential ownership interest in the company.

Ethics

The author(s) have declared ethics committee/IRB approval is not relevant to this content

Evaluating the generalizability of graph neural networks for predicting collision cross section

Authors

Abstract

Keywords

Supplementary materials

Supplementary weblinks

Comments

Version History

Metrics

License

DOI

Author’s competing interest statement

Ethics

Share