On the Use of Real-World Datasets for Reaction Yield Prediction

Mandana Saebi; Bozhao Nan; John Herr; Jessica Wahlers; Zhichun  Guo; Andrzej Zurański; Thierry Kogej; Per-Ola Norrby; Abigail Doyle; Olaf Wiest; Nitesh Chawla

doi:10.26434/chemrxiv-2021-2x06r-v3

Theoretical and Computational Chemistry

Search within Theoretical and Computational Chemistry

On the Use of Real-World Datasets for Reaction Yield Prediction

27 September 2021, Version 3

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

The lack of publicly available, large, and unbiased datasets is a key bottleneck for the application of machine learning (ML) methods in synthetic chemistry. Data from electronic laboratory notebooks (ELNs) could provide less biased, large datasets, but no such datasets have been made publicly available. The first real-world dataset from the ELNs of a large pharmaceutical company is disclosed and its relationship to high-throughput experimentation (HTE) datasets is described. For chemical yield predictions, a key task in chemical synthesis, an attributed graph neural network (AGNN) performs as good or better than the best previous models on two HTE datasets for the Suzuki and Buchwald-Hartwig reactions. However, training of the AGNN on the ELN dataset does not lead to a predictive model. The implications of using ELN data for training ML-based models are discussed in the context of yield predictions.

Keywords

Deep Learning

Graph Neural Networks

Yield prediction

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Now Published

On the use of real-world datasets for reaction yield prediction

Mandana Saebi, Bozhao Nan, John E. Herr, Jessica Wahlers, Zhichun Guo, Andrzej M. Zurański, Thierry Kogej, Per-Ola Norrby, Abigail G. Doyle, Nitesh V. Chawla, Olaf Wiest journal article

Chemical Science , Volume 14, Issue 19

Online publication date: 2023

Version History

Sep 27, 2021 Version 3

Jun 15, 2021 Version 2

May 17, 2021 Version 1

Version Notes

Results are updated and new figures added in this version. Material and methods are elaborated in more detail, and introduction and discussion are updated as well.

Metrics

7,655

5,964

Views

Downloads

License

The content is available under CC BY NC ND 4.0

DOI

10.26434/chemrxiv-2021-2x06r-v3

Funding

National Science Foundation

CHE-1925607

Astrazeneca

Author’s competing interest statement

No conflict of interest

Ethics

The author(s) declare that they have sought and gained approval from the relevant ethics committee/IRB for this research and its publication.

On the Use of Real-World Datasets for Reaction Yield Prediction

Authors

Abstract

Keywords

Comments

Now Published

Version History

Version Notes

Metrics

License

DOI

Funding

Author’s competing interest statement

Ethics

Share