Abstract
Identifying synthesis routes from knowledge graphs poses challenges beyond retrosynthesis, including path–finding artifacts and data issues. We introduce “SynGPS”, a novel algorithm that overcomes these limitations by identifying viable routes even with common artifacts. SynGPS can resolve nonsensical cycles, disconnect misleading links to starting materials, and remove ambiguous reactions, relying solely on topological heuristics for flexible scoring. We also present the Backtracking–Oriented Yield Aggregation (BOYA) algorithm, a new molar ratio–based method for calculating synthesis yield that addresses the molecular weight biases of existing weight-based approaches. Case studies demonstrate the effectiveness of SynGPS and BOYA algorithms, and we provide a rigorous theoretical framework that can facilitate the comparison of existing and future methods in the field of computer—aided synthesis planning (CASP).
Supplementary weblinks
Title
Source Code and Data Repository of the SynGPS and BOYA Algorithms
Description
This repository contains the implementation of the SynGPS and Backtracking–Oriented Yield Aggregation (BOYA) algorithms. The SynGPS algorithm can identify self-contained synthesis routes in a synthesis graph while attempting to resolve issues caused by typical artifacts stemming from data and path/walk finding issues. The BOYA algorithm computes the aggregated yield of synthesis routes in an innovative fashion, focusing on the molar rations between the target molecule and the starting materials. The BOYA algorithm is implemented in a linear-algebraic, hence GPU-friendly manner. The functionalities are available in the "main" branch as a Python library that can be installed via pip (please consult the README for details). We also provide data and the exact code in the "publication" branch used for developing the enclosed case studies case studies for replication purposes.
Actions
View