Machine Learning Models Capable of Chemical Deduction for Identifying Reaction Products

Tianfan Jin; Qiyuan Zhao; Andrew B Schofield; Brett Matthew Savoie

doi:10.26434/chemrxiv-2023-l6lzp

Analytical Chemistry

Search within Analytical Chemistry

Machine Learning Models Capable of Chemical Deduction for Identifying Reaction Products

14 June 2023, Version 1

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Deductive solution strategies are required in prediction scenarios that are under determined, when contradictory information is available, or more generally wherever one-to-many non-functional mappings occur. In contrast, most contemporary machine learning (ML) in the chemical sciences is inductive learning from example, with a fixed set of features. Chemical workflows are replete with situations requiring deduction, including many aspects of lab automation and spectral interpretation. Here, a general strategy is described for designing and training machine learning models capable of deduction that consists of combining individual inductive models into a larger deductive network. The training and testing of these models is demonstrated on the task of deducing reaction products from a mixture of spectral sources. The resulting models are capable of distinguishing between intended and unintended reaction outcomes and identifying starting material based on a mixture of spectral sources. The models are also capable of performing well on tasks that they were not directly trained on, like predicting minor products from named organic chemistry reactions, identifying reagents and isomers as plausible impurities, and handling missing or conflicting information. A new dataset of 1,124,043 simulated spectra that were generated to train these models is also distributed with this work. These findings demonstrate that deductive bottlenecks for chemical problems are not fundamentally insuperable for ML models.

Keywords

Product Identification

Spectral Interpretation

Supplementary materials

Title

Description

Actions

Title

Supporting Information

Description

Contains additional figures referenced in the main text

Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Version History

Jun 14, 2023 Version 1

Metrics

1,241

691

Views

Downloads

License

The content is available under CC BY NC ND 4.0

DOI

10.26434/chemrxiv-2023-l6lzp

Funding

Office of Naval Research

N00014-21-1-2476

Author’s competing interest statement

The author(s) have declared they have no conflict of interest with regard to this content

Ethics

The author(s) have declared ethics committee/IRB approval is not relevant to this content

Machine Learning Models Capable of Chemical Deduction for Identifying Reaction Products

Authors

Abstract

Keywords

Supplementary materials

Comments

Version History

Metrics

License

DOI

Funding

Author’s competing interest statement

Ethics

Share