Spec2Mol: An end-to-end deep learning framework for translating MS/MS Spectra to de-novo molecules

13 September 2021, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Elucidating the structure of a chemical compound is a fundamental task in chemistry with application in multiple domains including the emerging field of metabolomics, with promising applications in drug discovery, precision medicine, and biomarker discovery. The common practice for elucidating the structure of a chemical compound is to obtain a mass spectrum and subsequently retrieve its structure from spectral databases. However, database retrieval methods fail to identify novel molecules that are not present in the reference database. In this work, we propose Spec2Mol, a deep learning architecture for molecular structure recommendation given mass spectra alone. Spec2Mol is inspired by the Speech2Text deep learning architectures for translating audio signals into text. Our approach is based on an encoder-decoder architecture. The encoder learns the spectra embeddings, while the decoder, pre-trained on a massive dataset of chemical structures for translating between different molecular representations, reconstructs SMILES sequences of the recommended chemical structures. We have evaluated Spec2Mol by assessing the molecular similarity between the recommended structures and the original structure. Our analysis showed that Spec2Mol is able to identify the presence of key substructures in the molecule from its mass spectrum, and shows on par performance, when compared to existing fragmentation tree based methods, in recommending molecules for a given mass spectrum.

Keywords

Structure elucidation
Deep Learning
MS/MS spectra
Metabolomics

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.