Abstract
The use of enzymes for organic synthesis allows
for simplified, more economical and selective synthetic routes not accessible
to conventional reagents. However, predicting whether a particular molecule
might undergo a specific enzyme transformation is very difficult. Here we
exploited recent advances in computer assisted synthetic planning (CASP) by
considering the Molecular Transformer, which is a sequence-to-sequence machine
learning model that can be trained to predict the products of organic
transformations, including their stereochemistry, from the structure of
reactants and reagents. We used multi-task transfer learning to train the Molecular
Transformer with one million reactions from the US Patent Office (USPTO)
database as a source of general chemistry knowledge combined with 32,000 enzymatic
transformations, each one annotated with a text description of the enzyme. We
show that the resulting Enzymatic Transformer model predicts the products
formed from a given substrate and enzyme with remarkable accuracy, including
typical kinetic resolution processes.