Abstract
Mass spectral reference libraries are fundamental tools for compound identification in electron-ionization mass spectrometry (EI-MS). However, the inherent complexity of mass spectra and the lack of direct correlation between spectral and structural similarities present significant challenges in structure elucidation and accurate peak annotation. To address these challenges, we have introduced an approach combining CFM-EI, a fragmentation likelihood modeling tool in EI-MS data, with a multi-step complexity reduction strategy for mass-to-fragment mapping. Our methodology involves employing modified atomic environments to represent fragment ions of super small organic molecules and training a transformer model to predict the structural content of compounds based on mass and intensity data. This holistic solution not only aids in interpreting EI-MS data by providing insights into atom types but also refines cosine similarity rankings by suggesting inclusion or exclusion of specific atom types. Tests conducted on EI-MS data from the NIST database demonstrated that our approach complements conventional methods by improving spectra matching through an in-depth atomic-level analysis.