Automated Extraction of Chemical Synthesis Actions from Experimental Procedures

10 April 2020, Version 2
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Experimental procedures for chemical synthesis are commonly reported in prose in patents or in the scientific literature. The automatic extraction of the details necessary to reproduce and validate a synthesis in a chemical laboratory is quite often a tedious task, requiring extensive human intervention. We present a method to convert unstructured experimental procedures written in English to structured synthetic steps (action sequences) reflecting all the operations needed to successfully conduct the corresponding chemical reactions. To achieve this, we design a set of synthesis actions with predefined properties and a deep-learning sequence to sequence model based on the transformer architecture to convert experimental procedures to action sequences. The model is pretrained on vast amounts of data generated automatically with a custom rule-based natural language processing approach and refined on a smaller set of manually annotated samples. Predictions on our test set resulted in a perfect (100%) match of the action sequence for 60.8% of sentences, a 90% match for 71.3% of sentences, and a 75% match for 82.4% of sentences.

Keywords

Machine Learning
Organic Chemistry
Chemical Reactions
Deep Learning
Experimental Procedures
Automation
Organic Synthesis

Supplementary materials

Title
Description
Actions
Title
si
Description
Actions
Title
supplementary data 1 actions for test set
Description
Actions
Title
supplementary data 2 top 5 sequences
Description
Actions
Title
supplementary data 3 annotation guideline
Description
Actions
Title
supplementary data 4 onmt config
Description
Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.