Abstract
Gas chromatography coupled with electron impact mass spectrometry (GC‑EI‑MS) is a widely used analytical technique for identifying volatile and semi‑volatile compounds in applications ranging from pharmaceutical research to material science. However, since not every molecule is included in EI‑MS databases, scientists often have to identify unknown chromatographic peaks solely from their EI‑MS spectra. This manual interpretation is time-consuming and depends heavily on expert knowledge, often leading to ambiguous or inconclusive results. In this work, we introduce MASSISTANT, a novel deep learning model that directly predicts de novo molecular structures from low‑resolution EI‑MS spectra using SELFIES encoding. Trained on compounds with molecular weights below 600 Da, MASSISTANT’s performance is sensitive to dataset curation; while training on the full NIST dataset (180k spectra) yields approximately 10% exact predictions, a more focused, chemically homogeneous subset boosts this rate to as high as 54% (Tanimoto score = 1). These results highlight the capability of deep neural networks to capture complex fragmentation patterns and generate chemically valid structures, offering mass spectrometry scientists a powerful tool to enhance the interpretation and elucidation of whole molecular structures but also substructures, and functional groups in GC‑EI‑MS analyses.
Supplementary weblinks
Title
GitHub Repository
Description
GitHub repository containing source code used in our paper.
Actions
View