ChemRxiv
These are preliminary reports that have not been peer-reviewed. They should not be regarded as conclusive, guide clinical practice/health-related behavior, or be reported in news media as established information. For more information, please see our FAQs.
route-clustering-v1.pdf (3.17 MB)

Clustering of Synthetic Routes Using Tree Edit Distance

preprint
submitted on 14.12.2020, 09:30 and posted on 15.12.2020, 12:50 by Samuel Genheden, Ola Engkvist, Esben Jannik Bjerrum
We present a novel algorithm to compute the distance between synthesis routes based on a tree edit distance calculation. Such distances can be used to cluster synthesis routes from a retrosynthesis prediction tool. We show that the clustering of routes from a retrosynthesis analysis is performed in less than ten seconds on average, and only constitutes seven percent of the total time (prediction + clustering). Furthermore, we are able to show that representative routes from each cluster can be used to reduce the set of predicted routes. Finally, we show with a number of examples that the algorithm gives intuitive clusters that can be easily rationalized. The algorithm is included in the latest version of the open-source AiZynthFinder software.

History

Email Address of Submitting Author

samuel.genheden@astrazeneca.com

Institution

AstraZeneca

Country

Sweden

ORCID For Submitting Author

0000-0002-7624-7363

Declaration of Conflict of Interest

No conflict of interest

Exports