Abstract
We present a deep-learning model for inferring missing molecules in reaction equations. Such an algorithm features multiple interesting behaviors. First, it can infer the necessary reagents and solvents in chemical transformations specified only in terms of main compounds, as often resulting from retrosynthetic analyses. The completion with necessary reagents ensures that reaction equations are compatible with deep-learning models relying on a complete reaction specification. Second, it can cure existing datasets by detecting missing compounds, such as reagents that are essential for given classes of reactions. Finally, this model is a generalization of models for forward reaction prediction and retrosynthetic analysis, as both can be formulated in terms of incomplete reaction equations. We illustrate that a single trained model, based on the transformer architecture and acting on reaction SMILES strings, can address all three points.
Workshop paper at the Machine Learning for Molecules Workshop at NeurIPS 2020.
Workshop paper at the Machine Learning for Molecules Workshop at NeurIPS 2020.