Quantitative Interpretation Explains Machine Learning Models for Chemical Reaction Prediction and Uncovers Bias

David Peter Kovacs; William McCorkindale; Alpha Lee

doi:10.26434/chemrxiv.13061402.v1

Theoretical and Computational Chemistry

Search within Theoretical and Computational Chemistry

Quantitative Interpretation Explains Machine Learning Models for Chemical Reaction Prediction and Uncovers Bias

08 October 2020, Version 1

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Organic synthesis remains a stumbling block in drug discovery. Although a plethora of machine learning models have been proposed as solutions in the literature, they suffer from being opaque black-boxes. It is neither clear if the models are making correct predictions because they inferred the salient chemistry, nor is it clear which training data they are relying on to reach a prediction. This opaqueness hinders both model developers and users. In this paper, we quantitatively interpret the Molecular Transformer, the state-of-the-art model for reaction prediction. We develop a framework to attribute predicted reaction outcomes both to specific parts of reactants, and to reactions in the training set. Furthermore, we demonstrate how to retrieve evidence for predicted reaction outcomes, and understand counterintuitive predictions by scrutinising the data. Additionally, we identify ”Clever Hans” predictions where the correct prediction is reached for the wrong reason due to dataset bias. We present a new debiased dataset that provides a more realistic assessment of model performance, which we propose as the new standard benchmark for comparing reaction prediction models.

Keywords

Artificial Intelligence

Bias

Molecular Transformer

SMILES

Chemical Reactions

computer assisted synthesis planning

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Now Published

Quantitative interpretation explains machine learning models for chemical reaction prediction and uncovers bias

Dávid Péter Kovács, William McCorkindale, Alpha A. Lee journal article

Nature Communications , Volume 12, Issue 1

Online publication date: Mar 16, 2021

Version History

Oct 08, 2020 Version 1

Metrics

3,863

582

Views

Downloads

Citations

License

The content is available under CC BY NC ND 4.0

DOI

10.26434/chemrxiv.13061402.v1

Author’s competing interest statement

The authors declare no conflict of interest.

Quantitative Interpretation Explains Machine Learning Models for Chemical Reaction Prediction and Uncovers Bias

Authors

Abstract

Keywords

Comments

Now Published

Version History

Metrics

License

DOI

Author’s competing interest statement

Share