Accurate in silico models for the prediction of novel chemical reaction outcomes can be used to guide the rapid discovery of new reactivity and enable novel synthesis strategies for newly discovered lead compounds. Recent advances in machine learning, driven by deep learning models and data availability, have shown utility throughout synthetic organic chemistry as a data-driven method for reaction prediction. Here we present a machine-intelligence approach to predict the products of an organic reaction by integrating deep neural networks with a probabilistic and symbolic inference that flexibly enforces chemical constraints and accounts for prior chemical knowledge. We first train a graph convolutional neural network to estimate the likelihood of changes in covalent bonds, hydrogen counts, and formal charges. These estimated likelihoods govern a probability distribution over potential products. Integer Linear Programming is then used to infer the most probable products from the probability distribution subject to heuristic rules such as the octet rule and chemical constraints that reflect a user's prior knowledge. Our approach outperforms previous graph-based neural networks by predicting products with more than 90% accuracy, demonstrates intuitive chemical reasoning through a learned attention mechanism, and provides generalizability across various reaction types. Furthermore, we demonstrate the potential for even higher model accuracy when complemented by expert chemists contributing to the system, boosting both machine and expert performance. The results show the advantages of empowering deep learning models with chemical intuition and knowledge to expedite the drug discovery process.
InstitutionUniversity of Illinois at Urbana-Champaign
ORCID For Submitting Author0000-0003-0726-575X
Declaration of Conflict of InterestThe authors declare no competing interests.