Growing strings in a chemical reaction space for searching retrosynthesis pathways

17 August 2023, Version 1
This content is a preprint and has not undergone peer review at the time of posting.


Machine learning algorithms have shown great accuracy in predicting chemical reaction outcomes and retrosynthesis. However, designing synthesis pathways remains challenging for existing machine learning models which are trained for single-step prediction. In this manuscript, we propose a new approach by recasting the retrosynthesis problem as a string optimization problem, leveraging the similarity between chemical reactions and multidimensional geometrical vectors. Based on this premise, multi-step complex synthesis can be conceptualized as sequences that link multidimensional vectors (fingerprints) representing individual chemical reaction steps. We extracted an extensive corpus of chemical synthesis from patents and converted them into multi-dimensional strings. While optimizing the retrosynthetic path, we use the Euclidean metric to minimize the distance between the expanded trajectory of the growing retrosynthesis string and the corpus of extracted strings. By doing so, we promote the assembly of synthetic pathways that, in the chemical reaction space, will be more similar to existing retrosynthesis, thereby inheriting the strategic guidelines designed by human experts. We integrated this approach into the RXN platform ( and present the method’s application to complex synthesis as well as its ability to produce better synthetic strategies than current methodologies.


Chemical synthesis
Machine Learning


Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.