ChemRxiv
These are preliminary reports that have not been peer-reviewed. They should not be regarded as conclusive, guide clinical practice/health-related behavior, or be reported in news media as established information. For more information, please see our FAQs.
symbolic_rxn.pdf (19.02 MB)
0/0

Integrating Deep Neural Networks and Symbolic Inference for Organic Reactivity Prediction

preprint
submitted on 19.01.2020 and posted on 21.01.2020 by Wesley Wei Qian, Nathan T. Russell, Claire L. W. Simons, Yunan Luo, Martin D. Burke, Jian Peng
Accurate in silico models for the prediction of novel chemical reaction outcomes can be used to guide the rapid discovery of new reactivity and enable novel synthesis strategies for newly discovered lead compounds. Recent advances in machine learning, driven by deep learning models and data availability, have shown utility throughout synthetic organic chemistry as a data-driven method for reaction prediction. Here we present a machine-intelligence approach to predict the products of an organic reaction by integrating deep neural networks with a probabilistic and symbolic inference that flexibly enforces chemical constraints and accounts for prior chemical knowledge. We first train a graph convolutional neural network to estimate the likelihood of changes in covalent bonds, hydrogen counts, and formal charges. These estimated likelihoods govern a probability distribution over potential products. Integer Linear Programming is then used to infer the most probable products from the probability distribution subject to heuristic rules such as the octet rule and chemical constraints that reflect a user's prior knowledge. Our approach outperforms previous graph-based neural networks by predicting products with more than 90% accuracy, demonstrates intuitive chemical reasoning through a learned attention mechanism, and provides generalizability across various reaction types. Furthermore, we demonstrate the potential for even higher model accuracy when complemented by expert chemists contributing to the system, boosting both machine and expert performance. The results show the advantages of empowering deep learning models with chemical intuition and knowledge to expedite the drug discovery process.

History

Email Address of Submitting Author

weiqian3@illinois.edu

Institution

University of Illinois at Urbana-Champaign

Country

United States

ORCID For Submitting Author

0000-0003-0726-575X

Declaration of Conflict of Interest

The authors declare no competing interests.

Exports