These are preliminary reports that have not been peer-reviewed. They should not be regarded as conclusive, guide clinical practice/health-related behavior, or be reported in news media as established information. For more information, please see our FAQs.
mapping_chemical_reaction_space.pdf (11.64 MB)

Mapping the Space of Chemical Reactions using Attention-Based Neural Networks

revised on 06.08.2020 and posted on 07.08.2020 by Philippe Schwaller, Daniel Probst, Alain C. Vaucher, Vishnu H Nair, David Kreutter, Teodoro Laino, Jean-Louis Reymond

Organic reactions are usually assigned to classes grouping reactions with similar reagents and mechanisms. Reaction classes facilitate communication of complex concepts and efficient navigation through chemical reaction space. However, the classification process is a tedious task, requiring the identification of the corresponding reaction class template via annotation of the number of molecules in the reactions, the reaction center and the distinction between reactants and reagents. In this work, we show that transformer-based models can infer reaction classes from non-annotated, simple text-based representations of chemical reactions. Our best model reaches a classification accuracy of 98.2%. We also show that the learned representations can be used as reaction fingerprints which capture fine-grained differences between reaction classes better than traditional reaction fingerprints. The unprecedented insights into chemical reaction space enabled by our learned fingerprints is illustrated by an interactive reaction atlas providing visual clustering and similarity searching.



Interactive reaction atlas:


DP and JLP: NCCR TransCure - From transport physiology to identification of therapeutic targets. Swiss National Science Foundation


Email Address of Submitting Author


IBM Research Zurich / University of Bern



ORCID For Submitting Author


Declaration of Conflict of Interest

No conflict of interest

Version Notes

- Additional experiments and code - Machine Learning and the Physical Sciences Workshop at the 33rd Conference on Neural Information Processing Systems (NeurIPS)