Improved Selection of Rare Reactions in Template-Based Retrosynthesis Predictions

21 December 2021, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Identifying synthetic routes for molecules of interest is a crucial step when discovering new drugs or materials. To find synthetic routes, we can use computer-assisted synthesis planning using expansion policy networks trained on reaction templates extracted from patents and the literature. However, experience has shown that these networks are biased towards frequently reported reactions. This study shows that changing the molecular representation from an extended-connectivity fingerprint to a simple graph representation can increase the accuracy for templates used less than five times by 5.0- 8.5% points. We also illustrate that a simple oversampling of the training set yielded a top-1 accuracy increase in the 17-20% point range for templates used five times or less.

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.