Artificial Applicability Labels for Improving Policies in Retrosynthesis Prediction

Esben Jannik Bjerrum; Amol Thakkar; Ola Engkvist

doi:10.26434/chemrxiv.12249458.v1

Theoretical and Computational Chemistry

Search within Theoretical and Computational Chemistry

Artificial Applicability Labels for Improving Policies in Retrosynthesis Prediction

11 May 2020, Version 1

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Automated retrosynthetic planning algorithms are a research area of increased importance. Automated reaction template extraction from large datasets in conjunction with neural network enhanced tree search algorithms can find plausible routes to target compounds in seconds. However, the current way of training the neural networks to predict suitable templates for a given target product, leads to many predictions which are not applicable in silico. Most templates in the top-50 suggested templates can’t be applied to the target molecule to perform the virtual reaction. Here we describe how to generate data and train a neural network policy that predicts if templates are applicable or not. First, we generate a massive training dataset by applying each retrosynthetic template to each product from our reaction database. Second, we train a neural network to near perfect prediction of the applicability labels on a held-out test set. The trained network is then joined with a policy model trained to predict and prioritize templates using the labels from the original dataset. The combined model was found to outperform the policy model in a route-finding task using 1700 compounds from our internal drug discovery projects.

Keywords

retrosynthesis prediction

Neural Network Prediction

tree search algorithm

reaction rules

data augmentation

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Now Published

Artificial applicability labels for improving policies in retrosynthesis prediction

Esben Jannik Bjerrum, Amol Thakkar, Ola Engkvist journal article

Machine Learning: Science and Technology , Volume 2, Issue 1

Online publication date: Dec 24, 2020

Version History

May 11, 2020 Version 1

Metrics

3,297

640

Views

Downloads

License

The content is available under CC BY NC ND 4.0

DOI

10.26434/chemrxiv.12249458.v1

Funding

European Union’s Horizon 2020 research and innovation program under the Marie Skłodowska-Curie grant agreement no. 676434, “Big Data in Chemistry”

Author’s competing interest statement

no conflicts of interests

Artificial Applicability Labels for Improving Policies in Retrosynthesis Prediction

Authors

Abstract

Keywords

Comments

Now Published

Version History

Metrics

License

DOI

Funding

Author’s competing interest statement

Share