Exploring the Chemical Subspace of RPLC: a Data Driven Approach

Denice van Herwerden; Alexandros  Nikolopoulos; Leon Barron; Jake O'Brien; Bob Pirok; Kevin Thomas; Saer Samanipour

doi:10.26434/chemrxiv-2023-bdwh0-v2

Analytical Chemistry

Search within Analytical Chemistry

Exploring the Chemical Subspace of RPLC: a Data Driven Approach

25 September 2023, Version 2

This is not the most recent version. There is a

newer version

of this content available

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

The chemical space is comprised of a vast number of possible structures, of which an unknown portion comprises the human and environmental exposome. Such samples are frequently analyzed using non-targeted analysis via liquid chromatography (LC) coupled to high-resolution mass spectrometry often employing a reversed phase (RP) column. However, prior to analysis, the contents of these samples are unknown and could be comprised of thousands of known and unknown chemical constituents. Moreover, it is unknown which part of the chemical space is sufficiently retained and eluted using RPLC. Therefore, we present a generic framework that uses a data driven approach to predict whether molecules fall "inside", "maybe" inside, or "outside" of the RPLC subspace. Firstly, three retention index random forest (RF) regression models were constructed that showed that molecular fingerprints are able to predict RPLC retention behavior. Secondly, these models were used to setup the dataset for building a RPLC RF classification model. The RPLC classification model was able to correctly predict whether a chemical belonged to the RPLC subspace with an accuracy of 92% for the testing set. Finally, applying this model to the 91737 small molecules (i.e., <=1000 Da) in NORMAN SusDat showed that 19.1% fall "outside" of the RPLC subspace. Knowing which chemicals are outside of the RPLC subspace can assist in reducing potential candidates for library searching and avoid screening for chemicals that will not be present in RPLC data.

Keywords

Chemical space

Non-targeted analysis

data driven

reversed-phase liquid chromatography

High-resolution mass spectrometry

RPLC subspace

Supplementary materials

Title

Description

Actions

Title

Supporting information

Description

Overview of performance for using different types of molecular fingerprints, composition of reduced PubChem fingerprints, optimization, prediction, leverage, and feature importance results for the 3 RF regression models and the RPLC classification model, and the RPLC classification of NORMAN SusDat visualized by plotting the XLogP values versus the predicted ri values for the three ri regression models.

Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Version History

Jan 23, 2024 Version 3

Sep 25, 2023 Version 2

Sep 21, 2023 Version 1

Version Notes

Corrected the number of Norman SusDat chemicals in figure 1 to 91737

Metrics

1,435

685

Views

Downloads

Citations

License

The content is available under CC BY 4.0

DOI

10.26434/chemrxiv-2023-bdwh0-v2

Funding

National Health and Medical Research Council

EL1 2009209

Author’s competing interest statement

The author(s) have declared they have no conflict of interest with regard to this content

Ethics

The author(s) have declared ethics committee/IRB approval is not relevant to this content

Exploring the Chemical Subspace of RPLC: a Data Driven Approach

Authors

Abstract

Keywords

Supplementary materials

Comments

Version History

Version Notes

Metrics

License

DOI

Funding

Author’s competing interest statement

Ethics

Share