Revolutionizing Spectroscopic Analysis Using Sequence-to-Sequence Models I: From Infrared Spectra to Molecular Structures

Ethan French; Xianqi Deng; Siqi Chen; Cheng-Wei Ju; Xi Cheng; Lijun Zhang; Xiao Liu; Hui Guan; Zhou Lin

doi:10.26434/chemrxiv-2025-n4q84

Theoretical and Computational Chemistry

Search within Theoretical and Computational Chemistry

Revolutionizing Spectroscopic Analysis Using Sequence-to-Sequence Models I: From Infrared Spectra to Molecular Structures

11 April 2025, Version 1

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Infrared (IR) spectroscopy reveals molecular and material features via their characteristic vibrational frequencies in an efficient and sensitive style and has thus become one of the most popular analytical tools in broad areas involving chemical discovery. These fields include material synthesis, drug design, pharmacokinetics, safety screening, pollutant sensing, and observational astronomy. However, in situ molecular or material identification from spectral signals remains a resource-intensive challenge and requires professional training due to its complexity in tracking effects to causes. Motivated by the recent success of sequence-to-sequence (Seq2Seq) models from deep learning, we developed a direct, accurate, effortless and physics-informed protocol to realize such a in-situ spectrum-to-structure translation, and provided the proof-of-concept of our models using IR spectra. We expressed both the input IR spectrum and the output molecular structure as alphanumerical sequences, treated them as two sentences describing the same molecule from two different languages, and translated them into each other using Seq2Seq models from recurrent neural networks (RNNs) and Transformers. Trained and validated using a curated data set of 198,091 organic molecules from the QM9 and PC9 databases, our Seq2Seq models achieved state-of-the-art accuracy of up to 0.611, 0.850, 0.804, and > 0.972 in generating target molecular identities, chemical formulas, structural frameworks, and functional groups from only IR spectra. Our study sets the stage for a revolutionary way to analyze molecular or material spectra by replacing human labor with rapid and accurate deep learning approaches.

Keywords

infrared spectroscopy

molecular structure

backward spectrum-to-structure translation

pre-trained large language model

Transformer

recurrent neural network

Supplementary materials

Title

Description

Actions

Title

Supporting Information (PDF)

Description

Details normal mode analysis; brief descriptions of Seq2Seq models; brief descriptions of data processing protocols; formulas for evaluation metrics; additional results (PDF)

Actions

Supplementary weblinks

Title

Description

Actions

Title

Online Repository

Description

Link to online repository

Actions

View

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Version History

Apr 11, 2025 Version 1

Metrics

599

177

Views

Downloads

Citations

License

The content is available under CC BY NC ND 4.0

DOI

10.26434/chemrxiv-2025-n4q84

Funding

University of Massachusetts Amherst

Start-Up Fund

University of Massachusetts Amherst

ADVANCE Collaborative Research Seed Grant

Author’s competing interest statement

The author(s) have declared they have no conflict of interest with regard to this content

Ethics

The author(s) have declared ethics committee/IRB approval is not relevant to this content

Revolutionizing Spectroscopic Analysis Using Sequence-to-Sequence Models I: From Infrared Spectra to Molecular Structures

Authors

Abstract

Keywords

Supplementary materials

Supplementary weblinks

Comments

Version History

Metrics

License

DOI

Funding

Author’s competing interest statement

Ethics

Share