Fast Prediction of Distances Between Synthetic Routes with Deep Learning

15 June 2021, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

We expand our recent work on clustering of synthesis routes and train a deep learning model to predict the distances between arbitrary routes. The model is based on an long short-term memory (LSTM) representation of a synthesis route and is trained as a twin network to reproduce the tree edit distance (TED) between two routes. The ML approach is approximately two orders of magnitude faster than the TED approach and enables clustering many more routes from a retrosynthesis route prediction. The clusters have a high degree of similarity to the clusters given by the TED-based approach and are accordingly intuitive and explainable. We provide the developed model as open-source (https://github.com/MolecularAI/route-distances).

Keywords

computer-aided synthesis prediction
tree edit distance
clustering
tree LSTM

Supplementary weblinks

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.