Abstract
Finding relevant chemicals in the vast (known) chemical space is a major challenge for environmental and exposomics studies leveraging non-target high resolution mass spectrometry (NT-HRMS) methods. Chemical databases now contain hundreds of millions of chemicals, yet many are not relevant. This article details an extensive collaborative, open science effort to provide a dynamic collection of chemicals for environmental, metabolomics and exposomics research, along with supporting information about their relevance to assist researchers in the interpretation of candidate hits. The PubChemLite for Exposomics collection is compiled from ten annotation categories within PubChem, enhanced with patent, literature and annotation counts, predicted partition coefficient (logP) values, as well as predicted collision cross section (CCS) values using CCSbase. Monthly versions are archived on Zenodo under a CC-BY license, supporting reproducible research, and a new interface has been developed, including the chemical stripes on patent and literature data, for researchers to browse the collection. This article further describes how PubChemLite can support researchers in environmental/exposomics studies, describes efforts to increase the availability of experimental CCS values, and explores known limitations and potential for future developments. The data and code behind these efforts are openly available. PubChemLite content can be explored at https://pubchemlite.lcsb.uni.lu.
Supplementary weblinks
Title
PubChemLite CSV Download (monthly updates)
Description
This DOI redirects to the latest version of PubChemLite (CSV file), typically updating last Friday of the month.
Actions
View Title
C3SDB Code
Description
The code used to predict the collision cross section (CCS) values integrated in this article.
Actions
View Title
PubChemLite-CCS CSV Download (monthly updates)
Description
This DOI redirects to the latest version of the PubChemLite-CCS dataset containing the predicted CCS values from CCSbase.
Actions
View Title
PubChemLite Web Interface
Description
The web interface for PubChemLite (including the collision cross section values).
Actions
View Title
PubChemLite Web Interface Code
Description
The code repository for the PubChemLite web interface.
Actions
View Title
Chemical Stripes Code
Description
The repository containing the chemical stripes code.
Actions
View Title
PubChemLite Build System Code
Description
Code repository for the PubChemLite Build System. The corresponding input files are here: https://gitlab.com/uniluxembourg/lcsb/eci/pubchemlite-input
Actions
View Title
PubChem FTP Site
Description
The data to compile PubChemLite are obtained from the PubChem FTP site.
Actions
View