Completion of Partial Reaction Equations

24 November 2020, Version 1
This content is a preprint and has not undergone peer review at the time of posting.


We present a deep-learning model for inferring missing molecules in reaction equations. Such an algorithm features multiple interesting behaviors. First, it can infer the necessary reagents and solvents in chemical transformations specified only in terms of main compounds, as often resulting from retrosynthetic analyses. The completion with necessary reagents ensures that reaction equations are compatible with deep-learning models relying on a complete reaction specification. Second, it can cure existing datasets by detecting missing compounds, such as reagents that are essential for given classes of reactions. Finally, this model is a generalization of models for forward reaction prediction and retrosynthetic analysis, as both can be formulated in terms of incomplete reaction equations. We illustrate that a single trained model, based on the transformer architecture and acting on reaction SMILES strings, can address all three points.

Workshop paper at the Machine Learning for Molecules Workshop at NeurIPS 2020.


Reaction SMILES
Reaction equation
Machine Learning
Deep Learning
Molecular Transformer
Chemical Reactions


Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.