These are preliminary reports that have not been peer-reviewed. They should not be regarded as conclusive, guide clinical practice/health-related behavior, or be reported in news media as established information. For more information, please see our FAQs.
Bjerrum2020- Artificial Applicability Labels for Improving Policies in Retrosynthesis Prediction.pdf (533.44 kB)
Artificial Applicability Labels for Improving Policies in Retrosynthesis Prediction
Preprints are manuscripts made publicly available before they have been submitted for formal peer review and publication. They might contain new research findings or data. Preprints can be a draft or final version of an author's research but must not have been accepted for publication at the time of submission.
Automated retrosynthetic planning algorithms are
a research area of increased importance. Automated reaction template extraction
from large datasets in conjunction with neural network enhanced tree search
algorithms can find plausible routes to target compounds in seconds. However,
the current way of training the neural networks to predict suitable templates
for a given target product, leads to many predictions which are not applicable in silico. Most templates in the top-50
suggested templates can’t be applied to the target molecule to perform the
virtual reaction. Here we describe how to generate data and train a neural
network policy that predicts if templates are applicable or not. First, we
generate a massive training dataset by applying each retrosynthetic template to
each product from our reaction database. Second, we train a neural network to
near perfect prediction of the applicability labels on a held-out test set. The
trained network is then joined with a policy model trained to predict and
prioritize templates using the labels from the original dataset. The combined
model was found to outperform the policy model in a route-finding task using
1700 compounds from our internal drug discovery projects.