Abstract
In this work we detail an automated reaction network hypothesis generation protocol for processes involving complex feedstocks where information about the species and reactions involved is unknown. Our methodology is process agnostic and can be utilized in any reactive process with spectroscopic measurements that provide information on the evolution of the components in the mixture. We decompose the mixture spectra to obtain spectroscopic signatures of the individual components and use a 1-d convolutional neural network to automatically identify functional groups indicated by them. We employ atom-atom mapping to automatically recover reaction rules that are applied on candidate molecules identified from chemistry databases through fingerprint similarity. The method is tested on synthetic data and on spectroscopic measurements of lab-scale batch Hydrothermal Liquefaction (HTL) of biomass to determine the accuracy of prediction across datasets of varying complexities . Our methodology is able to identify reaction network hypotheses containing reaction networks close to the ground truth in the case of synthetic data and we are also able to recover candidate molecules and reaction networks close to the ones reported in previous literature studies for biomass pyrolysis.