Expanding the chemical space using a Chemical Reaction Knowledge Graph

25 August 2023, Version 1
This content is a preprint and has not undergone peer review at the time of posting.


In this work, we present a new molecular de novo design approach which utilizes a knowledge graph encoding of chemical reactions, extracted from the publicly available USPTO (United States Patent and Trademark Office) dataset. Our proposed method can be used to expand the chemical space by performing forward synthesis prediction on the knowledge graph and can generate libraries of de novo compounds along with a valid synthetic route. The forward synthesis prediction of novel compounds involves two steps. In a first step, a graph neural network-based link prediction model is used to suggest pairs of existing reactant nodes in the graph that are likely to react. In a second step, product prediction is performed using a molecular transformer model to obtain the potential products for the suggested reactant pairs. We achieve a ROC-AUC score of 0.861 for link prediction in the knowledge graph and for the product prediction a top-1 accuracy of 0.924. The method’s utility is demonstrated by generating a set of de novo compounds by predicting high probability reactions in USPTO. The generated compounds are diverse in nature and many exhibits drug-like properties. Further, evaluation of the potential activity using a quantitative structure–activity relationship (QSAR) model suggested presence of potential dopamine receptor D2 (DRD2) modulators among the proposed compounds. In summary, our results suggest that the proposed method can expand the easily accessible chemical space and identify novel drug-like compounds for a specific target.


link prediction
chemical reactions
synthesis prediction
forward synthesis prediction
chemical space
de novo design
knowledge graph
reaction graph


Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.