Molecular Transformer-aided Biocatalysed Synthesis Planning

24 May 2021, Version 1


Enzyme catalysts are an integral part of green chemistry strategies towards a more sustainable and resource-efficient chemical synthesis. However, the use of enzymes on unreported substrates and their specific stereo- and regioselectivity are domain-specific knowledge factors that require decades of field experience to master. This makes the retrosynthesis of given targets with biocatalysed reactions a significant challenge. Here, we use the molecular transformer architecture to capture the latent knowledge about enzymatic activity from a large data set of publicly available biochemical reactions, extending forward reaction and retrosynthetic pathway prediction to the domain of biocatalysis. We introduce the use of a class token based on the EC classification scheme that allows to capture catalysis patterns among different enzymes belonging to the same hierarchical families. The forward prediction model achieves an accuracy of 49.6% and 62.7%, top-1 and top-5 respectively, while the single-step retrosynthetic model shows a round-trip accuracy of 39.6% and 42.6%, top-1 and top-10 respectively. Trained models and curated data are made publicly available with the hope of promoting enzymatic catalysis and making green chemistry more accessible through the use of digital technologies.


Supplementary materials



Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.