Retrosynthesis Prediction using Grammar-based Neural Machine Translation: An Information-Theoretic Approach

Vipul Mann; Venkat Venkatasubramanian

doi:10.26434/chemrxiv-2021-l755t-v2

Chemical Engineering and Industrial Chemistry

Search within Chemical Engineering and Industrial Chemistry

Retrosynthesis Prediction using Grammar-based Neural Machine Translation: An Information-Theoretic Approach

26 August 2021, Version 2

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Retrosynthetic prediction is one of the main challenges in chemical synthesis because it requires a search over the space of plausible chemical reactions that often results in complex, multi-step, branched synthesis trees for even moderately complex organic reactions. Here, we propose an approach that performs single-step retrosynthesis prediction using SMILES grammar-based representations in a neural machine translation framework. Information-theoretic analyses of such grammar-representations reveal that they are superior to SMILES representations and are better-suited for machine learning tasks due to their underlying redundancy and high information capacity. We report the top-1 prediction accuracy of 43.8% (syntactic validity 95.6%) and maximal fragment (MaxFrag) accuracy of 50.4%. Comparing our model’s performance with previous work that used character-based SMILES representations demonstrate significant reduction in grammatically invalid predictions and improved prediction accuracy. Fewer invalid predictions for both known and unknown reaction class scenarios demonstrate the model’s ability to learn the underlying SMILES grammar efficiently.

Keywords

reaction prediction

retrosynthetic methods

SMILES strings

Information theory

Computer Aided Synthesis Planning

sequence to sequence (Seq2Seq)

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Now Published

Retrosynthesis prediction using grammar-based neural machine translation: An information-theoretic approach

Vipul Mann, Venkat Venkatasubramanian journal article

Computers & Chemical Engineering , Volume 155

Print publication date: Dec, 2021

Version History

Aug 26, 2021 Version 2

Apr 15, 2021 Version 1

Version Notes

Updated with additional results.

Metrics

2,000

1,020

Views

Downloads

Citations

License

The content is available under CC BY NC ND 4.0

DOI

10.26434/chemrxiv-2021-l755t-v2

Author’s competing interest statement

The author(s) have declared they have no conflict of interest with regard to this content

Ethics

The author(s) have declared ethics committee/IRB approval is not relevant to this content

Retrosynthesis Prediction using Grammar-based Neural Machine Translation: An Information-Theoretic Approach

Authors

Abstract

Keywords

Comments

Now Published

Version History

Version Notes

Metrics

License

DOI

Author’s competing interest statement

Ethics

Share