Retrosynthesis Prediction using Grammar-based Neural Machine Translation: An Information-Theoretic Approach

Vipul Mann; Venkat Venkatasubramanian

doi:10.26434/chemrxiv.14410442.v1

Theoretical and Computational Chemistry

Search within Theoretical and Computational Chemistry

Retrosynthesis Prediction using Grammar-based Neural Machine Translation: An Information-Theoretic Approach

15 April 2021, Version 1

This is not the most recent version. There is a

newer version

of this content available

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Retrosynthetic prediction is one of the main challenges in chemical synthesis that requires identifying reaction pathways and precursor molecules for synthesizing a target molecule. This requires a search over the space of plausible chemical reactions that often results in complex, multi-step, branched synthesis trees for even moderately complex organic reactions. Here, we propose an approach that performs single-step retrosynthesis prediction using SMILES grammar-based representations in a neural machine translation framework. Information-theoretic analyses of such grammar-representations reveal that they are both superior and well-suited for machine learning tasks due to their underlying redundancy and high information capacity compared to purely character-based representations. We report the top-1 prediction accuracy of 43.8% (top-5 measure of 61.4%) and syntactic validity of 95.6% (top-5 measure of 91.6%) on a standard reaction dataset. Comparing our model's performance with previous work that used purely character-based SMILES representations demonstrate improved accuracy and reduced grammatically invalid predictions.

Keywords

reaction prediction

retrosynthetic methods

SMILES strings

Information theory

Computer Aided Synthesis Planning

sequence to sequence (Seq2Seq)

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Version History

Aug 26, 2021 Version 2

Apr 15, 2021 Version 1

Metrics

2,005

1,023

Views

Downloads

Citations

License

The content is available under CC BY NC ND 4.0

DOI

10.26434/chemrxiv.14410442.v1

Funding

Center for the Management of Systemic Risk (CMSR), Columbia University, New York

Author’s competing interest statement

The authors declare no conflict of interest.

Retrosynthesis Prediction using Grammar-based Neural Machine Translation: An Information-Theoretic Approach

Authors

Abstract

Keywords

Comments

Version History

Metrics

License

DOI

Funding

Author’s competing interest statement

Share