Abstract
This work introduces a fragment-based chemical language -- fragSMILES -- into reaction prediction with deep learning. FragSMILES encodes molecular substructures and chirality, enabling compact and expressive string representations. In a systematic comparison with well-established molecular notations — Simplified Molecular Input Line Entry System (SMILES), Self-Referencing Embedded Strings (SELFIES), and Sequential Attachment-based Fragment Embedding (SAFE) — fragSMILES achieved the highest performance across forward- and retro-synthesis prediction, with superior recognition of stereochemical reaction information. Moreover, fragSMILES enhances the capacity to capture stereochemical complexity -- a key challenge in synthesis planning. Our results demonstrate that chirality-aware and fragment-level representations can advance current computer-assisted synthesis planning efforts.
Supplementary materials
Title
Electronic supporting information
Description
Methods, Figure and Tables are included in this file.
Actions