Spectro: A multi-modal approach for molecule elucidation using IR and NMR data

06 November 2024, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Molecular structure elucidation is a crucial but fundamentally challenging step in the characterization of materials given the large number of possible structures. Here, we introduce Spectro, an innovative multi-modal approach for molecular elucidation that combines $\CNMR$ and $\HNMR$ NMR data with IR. Spectro translates the embedded representations of the spectra into molecular structures using the SELFIES notation. We employed a vision model for the embedded representation of the IR data, which was pretrained to detect relevant functional group peaks in the IR spectra achieving an F1 score of 91\%. For NMR data, we utilized LLM2Vec, treating the NMR spectra as text. This integration of multiple spectroscopic techniques allows Spectro to achieve an overall test accuracy of 93\% when trained jointly with the vision model for the IR spectra, and 82\% when trained with fixed embeddings. Our approach demonstrates the potential of multi-modal learning in tackling complex molecular characterization tasks.

Keywords

vision models
molecule-to-spectra
selfies

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.