Accurate and Automated de novo Identification of Molecular Functional Groups Using Deep Learning Architectures

We present a deep learning method for identifying all the functional groups of unknown compounds using a combination of FTIR and MS spectra without the use of any database, pre-established rules, procedures, or peak-matching methods. We derive patterns and correlations directly from spectral data representing multiple functional groups as a multi-class, multi-label problem. For practical usability, we introduce two new metrics (Molecular F1 score and Molecular Perfection rate) to measure the performance by identifying all functional groups on molecules. Our optimized model has a Molecular F1 score of 0.92 and a Molecular Perfection rate of 72%. Backpropagation of our model reveals IR patterns typically used by human chemists suggesting “learning” of known spectral features. We show that the introduction of new functional groups does not decrease model performance. Finally, we show redundancy in FTIR and MS data by encoding combined data in a latent space that retains the accuracy of the original model. Our results reveal the importance of deep learning for rapid identification of functional groups to realize autonomous analytical processes in the future.