These are preliminary reports that have not been peer-reviewed. They should not be regarded as conclusive, guide clinical practice/health-related behavior, or be reported in news media as established information. For more information, please see our FAQs.
Clustering of Synthetic Routes Using Tree Edit Distance
preprintsubmitted on 14.12.2020, 09:30 and posted on 15.12.2020, 12:50 by Samuel Genheden, Ola Engkvist, Esben Jannik Bjerrum
We present a novel algorithm to compute the distance between synthesis routes based on a tree edit distance calculation. Such distances can be used to cluster synthesis routes from a retrosynthesis prediction tool. We show that the clustering of routes from a retrosynthesis analysis is performed in less than ten seconds on average, and only constitutes seven percent of the total time (prediction + clustering). Furthermore, we are able to show that representative routes from each cluster can be used to reduce the set of predicted routes. Finally, we show with a number of examples that the algorithm gives intuitive clusters that can be easily rationalized. The algorithm is included in the latest version of the open-source AiZynthFinder software.