ChemRxiv
These are preliminary reports that have not been peer-reviewed. They should not be regarded as conclusive, guide clinical practice/health-related behavior, or be reported in news media as established information. For more information, please see our FAQs.
Understanding_Chemical_Reaction_Prediction.pdf (2.14 MB)

Quantitative Interpretation Explains Machine Learning Models for Chemical Reaction Prediction and Uncovers Bias

preprint
submitted on 07.10.2020 and posted on 08.10.2020 by David Peter Kovacs, William McCorkindale, Alpha Lee

Organic synthesis remains a stumbling block in drug discovery. Although a plethora of machine learning models have been proposed as solutions in the literature, they suffer from being opaque black-boxes. It is neither clear if the models are making correct predictions because they inferred the salient chemistry, nor is it clear which training data they are relying on to reach a prediction. This opaqueness hinders both model developers and users. In this paper, we quantitatively interpret the Molecular Transformer, the state-of-the-art model for reaction prediction. We develop a framework to attribute predicted reaction outcomes both to specific parts of reactants, and to reactions in the training set. Furthermore, we demonstrate how to retrieve evidence for predicted reaction outcomes, and understand counterintuitive predictions by scrutinising the data. Additionally, we identify ”Clever Hans” predictions where the correct prediction is reached for the wrong reason due to dataset bias. We present a new debiased dataset that provides a more realistic assessment of model performance, which we propose as the new standard benchmark for comparing reaction prediction models.

History

Email Address of Submitting Author

dpk25@cam.ac.uk

Institution

University of Cambridge

Country

United Kingdom

ORCID For Submitting Author

0000-0002-0854-2635

Declaration of Conflict of Interest

The authors declare no conflict of interest.

Exports