ChemRxiv
These are preliminary reports that have not been peer-reviewed. They should not be regarded as conclusive, guide clinical practice/health-related behavior, or be reported in news media as established information. For more information, please see our FAQs.
prediction_of_reaction_yields.pdf (3.34 MB)
0/0

Prediction of Chemical Reaction Yields using Deep Learning

preprint
submitted on 04.08.2020 and posted on 05.08.2020 by Philippe Schwaller, Alain C. Vaucher, Teodoro Laino, Jean-Louis Reymond
Artificial intelligence is driving one of the most important revolutions in organic chemistry.
Multiple platforms, including tools for reaction prediction and synthesis planning based on machine learning, successfully became part of the organic chemists' daily laboratory, assisting in domain-specific synthetic problems. Unlike reaction prediction and retrosynthetic models, reaction yields models have been less investigated, despite the enormous potential of accurately predicting them. Reaction yields models, describing the percentage of the reactants that is converted to the desired products, could guide chemists and help them select high-yielding reactions and score synthesis routes, reducing the number of attempts. So far, yield predictions have been predominantly performed for high-throughput experiments using a categorical (one-hot) encoding of reactants, concatenated molecular fingerprints, or computed chemical descriptors. Here, we extend the application of natural language processing architectures to predict reaction properties given a text-based representation of the reaction, using an encoder transformer model combined with a regression layer. We demonstrate outstanding prediction performance on two high-throughput experiment reactions sets. An analysis of the yields reported in the open-source USPTO data set shows that their distribution differs depending on the mass scale, limiting the dataset applicability in reaction yields predictions.

History

Email Address of Submitting Author

phs@zurich.ibm.com

Institution

IBM Research -- Europe / University of Bern

Country

Switzerland

ORCID For Submitting Author

0000-0003-3046-6576

Declaration of Conflict of Interest

No conflict of interest.

Exports