Abstract
As a significant task of pharmaceutical and chemical engineering, molecular retrosynthesis aims at predicting candidate reactants from predefined products. Treating this challenging task as a conditional generative modeling problem, we propose a hierarchical graph autoregression (HGAR) model and its pretraining-assisted multi-task learning paradigm, leading to an effective semi-template molecular retrosynthesis method. Given a product, we first construct a hierarchical graph by connecting the junction tree of its motifs to the atom-level molecular graph. Our HGAR model embeds the hierarchical graph in the motif and atom levels, respectively. The atom-level embeddings are applied to predict reaction centers and derive synthons from the product. The motif-level embeddings are applied to predict motifs and complete the corresponding synthons autoregressive, leading to the target reactants. We first pretrain the model on PCQM4M-LSC and then fine-tune it on the USPTO retrosynthesis datasets, leading to a model with good generalization power. Experiments show that our HGAR outperforms many representative molecular retrosynthesis methods, especially those semi-template ones, indicating its feasibility and effectiveness in practice.