Abstract
Herein we present a new retrosynthesis prediction method, viz. RetroTRAE, which uses fragment-based tokenization combined with the Transformer architecture. RetroTRAE mimics chemical reasoning, and predicts reactant candidates by learning the changes of atomic environments associated with the chemical reaction. Atom environments stand as ideal, chemically meaningful building blocks, which together produce a high-resolution molecular representation. Describing a molecule with a set of atom environments establishes a clear relationship between translated product-reactant pairs due to the conservation of atoms in the reactions. Our model achieved a top-1 accuracy of 68.1% within the bioactively similar range for the USPTO test dataset, outperforming other state-of-the-art translation methods. Besides yielding a high level of overall accuracy, the proposed method solves the translation issues arising from the SMILES-based retrosynthesis planning methods effectively. Through careful inspection of reactant candidates, we demonstrated atom environments as promising descriptors for studying reaction route prediction and discovery. RetroTRAE provides fast and reliable retrosynthetic route planning for substances whose fragmentation patterns are revealed. Our methodology offers a novel way of devising a retrosynthetic planning model using fragmental and topological descriptors as natural inputs for chemical translation tasks.