Can Organic Chemistry Literature Enable Machine Learning Yield Prediction ?

Jules Schleinitz; Maxime Langevin; Yanis Smail; Benjamin Wehnert; Laurence Grimaud; Rodolphe Vuilleumier

doi:10.26434/chemrxiv-2022-t6435

Chemical Engineering and Industrial Chemistry

Search within Chemical Engineering and Industrial Chemistry

Can Organic Chemistry Literature Enable Machine Learning Yield Prediction ?

25 March 2022, Version 1

This is not the most recent version. There is a

newer version

of this content available

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Synthetic yield prediction using machine learning is intensively studied. While previous work focused on an ideal use case, High-Throughput Experiment datasets, predicting yields using literature data remains elusive. We built a large literature- based dataset of more than a thousand reactions, focusing on the activation of carbon-oxygen bonds of phenol derivatives under nickel catalysis. Detailed reaction conditions and associated yields were manually curated and stored in an open- access database. We assessed the performances of state-of-the-art machine learning models on this dataset, and explored their ability to realize predictions on novel publications, coupling partners and substrates. Our work shows that on well- designed yield prediction tasks, machine learning can have practical applications, and provides a unique public database for further improvements of these methods adapted to literature chemical data.

Keywords

Dataset

Machine Learning

Reaction Yield Prediction

Supplementary materials

Title

Description

Actions

Title

Supplementary Informations

Description

Details on the code and the methods used to train the model and featurize the data. Additional information supporting the main manuscript.

Actions

Supplementary weblinks

Title

Description

Actions

Title

NiCOlit code and data

Description

The NiCOlit dataset is available. The code used to generate the results is available.

Actions

View

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Version History

May 18, 2022 Version 2

Mar 25, 2022 Version 1

Metrics

2,905

1,278

Views

Downloads

Citations

License

The content is available under CC BY 4.0

DOI

10.26434/chemrxiv-2022-t6435

Funding

ANRT

2019/0821

CNRS

ENS

Author’s competing interest statement

M.L. is a Sanofi employee and may hold shares and/or stock options in the company. J.S., B.W., Y.S., R.V., and L.G. declares that they have no competing interests.

Ethics

The author(s) declare that they have sought and gained approval from the relevant ethics committee/IRB for this research and its publication.

Can Organic Chemistry Literature Enable Machine Learning Yield Prediction ?

Authors

Abstract

Keywords

Supplementary materials

Supplementary weblinks

Comments

Version History

Metrics

License

DOI

Funding

Author’s competing interest statement

Ethics

Share