These are preliminary reports that have not been peer-reviewed. They should not be regarded as conclusive, guide clinical practice/health-related behavior, or be reported in news media as established information. For more information, please see our FAQs.
Reaction Prediction_ChemRXiv Aug18.pdf (4.62 MB)

Prediction of Chemical Reactions Using Statistical Models of Chemical Knowledge

submitted on 10.08.2018, 12:15 and posted on 10.08.2018, 14:47 by Philipp-Maximilian Jacob, Alexei Lapkin
Is chemistry discoverable or can it only be invented? – this is the question of a computer scientist and a philosopher of science when looking at application of artificial intelligence methods for developing new chemical entities and new chemical transformations. This study confirms that, at least today, chemistry is, in part, discoverable from past history of chemical research – the accumulated chemical data contains hidden rules of chemistry, which can be exploited to discover new reaction pathways. This is shown using a stochastic block model approach, trained on chemical reaction data obtained from Reaxys®.


P.-M. Jacob would like to thank Peterhouse and the University of Cambridge for funding in the form of a PhD studentship. We gratefully acknowledge collaboration with RELX Intellectual Properties SA, Elsevier and their technical support, which enabled us to mine Reaxys. This work was funded, in part, by the EPSRC project “Terpene-based manufacturing for sustainable chemical feedstocks” EP/K014889. This project is funded in part by the National Research Foundation (NRF), Prime Minister’s Office, Singapore, under its Campus for Research Excellence and Technological Enterprise (CREATE) program as a part of the Cambridge Centre for Advanced Research and Education in Singapore Ltd (CARES).


Email Address of Submitting Author


University of Cambridge


United Kingdom

ORCID For Submitting Author


Declaration of Conflict of Interest

Authors declare no conflict of interest

Version Notes

This version will be updated with a 2nd validation method and correcting some minor errors in the text and examples shown.