ChemRxiv
These are preliminary reports that have not been peer-reviewed. They should not be regarded as conclusive, guide clinical practice/health-related behavior, or be reported in news media as established information. For more information, please see our FAQs.
Bjerrum2020- Artificial Applicability Labels for Improving Policies in Retrosynthesis Prediction.pdf (533.44 kB)
0/0

Artificial Applicability Labels for Improving Policies in Retrosynthesis Prediction

preprint
submitted on 05.05.2020 and posted on 11.05.2020 by Esben Jannik Bjerrum, Amol Thakkar, Ola Engkvist
Automated retrosynthetic planning algorithms are a research area of increased importance. Automated reaction template extraction from large datasets in conjunction with neural network enhanced tree search algorithms can find plausible routes to target compounds in seconds. However, the current way of training the neural networks to predict suitable templates for a given target product, leads to many predictions which are not applicable in silico. Most templates in the top-50 suggested templates can’t be applied to the target molecule to perform the virtual reaction. Here we describe how to generate data and train a neural network policy that predicts if templates are applicable or not. First, we generate a massive training dataset by applying each retrosynthetic template to each product from our reaction database. Second, we train a neural network to near perfect prediction of the applicability labels on a held-out test set. The trained network is then joined with a policy model trained to predict and prioritize templates using the labels from the original dataset. The combined model was found to outperform the policy model in a route-finding task using 1700 compounds from our internal drug discovery projects.

Funding

European Union’s Horizon 2020 research and innovation program under the Marie Skłodowska-Curie grant agreement no. 676434, “Big Data in Chemistry”

History

Email Address of Submitting Author

esben.bjerrum@astrazeneca.com

Institution

AstraZeneca

Country

Sweden

ORCID For Submitting Author

0000-0003-1614-7376

Declaration of Conflict of Interest

no conflicts of interests

Exports