Integrating Deep Neural Networks and Symbolic Inference for Organic Reactivity Prediction

21 January 2020, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Accurate in silico models for the prediction of novel chemical reaction outcomes can be used to guide the rapid discovery of new reactivity and enable novel synthesis strategies for newly discovered lead compounds. Recent advances in machine learning, driven by deep learning models and data availability, have shown utility throughout synthetic organic chemistry as a data-driven method for reaction prediction. Here we present a machine-intelligence approach to predict the products of an organic reaction by integrating deep neural networks with a probabilistic and symbolic inference that flexibly enforces chemical constraints and accounts for prior chemical knowledge. We first train a graph convolutional neural network to estimate the likelihood of changes in covalent bonds, hydrogen counts, and formal charges. These estimated likelihoods govern a probability distribution over potential products. Integer Linear Programming is then used to infer the most probable products from the probability distribution subject to heuristic rules such as the octet rule and chemical constraints that reflect a user's prior knowledge. Our approach outperforms previous graph-based neural networks by predicting products with more than 90% accuracy, demonstrates intuitive chemical reasoning through a learned attention mechanism, and provides generalizability across various reaction types. Furthermore, we demonstrate the potential for even higher model accuracy when complemented by expert chemists contributing to the system, boosting both machine and expert performance. The results show the advantages of empowering deep learning models with chemical intuition and knowledge to expedite the drug discovery process.

Keywords

Reaction Prediction
Oragnic Synthesis
Machine Learning
Deep Learning

Supplementary weblinks

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.