ChemRxiv
These are preliminary reports that have not been peer-reviewed. They should not be regarded as conclusive, guide clinical practice/health-related behavior, or be reported in news media as established information. For more information, please see our FAQs.
1/1
0/0

Automated Extraction of Chemical Synthesis Actions from Experimental Procedures

preprint
revised on 09.04.2020 and posted on 10.04.2020 by Alain C. Vaucher, Federico Zipoli, Joppe Geluykens, Vishnu H Nair, Philippe Schwaller, Teodoro Laino

Experimental procedures for chemical synthesis are commonly reported in prose in patents or in the scientific literature. The automatic extraction of the details necessary to reproduce and validate a synthesis in a chemical laboratory is quite often a tedious task, requiring extensive human intervention. We present a method to convert unstructured experimental procedures written in English to structured synthetic steps (action sequences) reflecting all the operations needed to successfully conduct the corresponding chemical reactions. To achieve this, we design a set of synthesis actions with predefined properties and a deep-learning sequence to sequence model based on the transformer architecture to convert experimental procedures to action sequences. The model is pretrained on vast amounts of data generated automatically with a custom rule-based natural language processing approach and refined on a smaller set of manually annotated samples. Predictions on our test set resulted in a perfect (100%) match of the action sequence for 60.8% of sentences, a 90% match for 71.3% of sentences, and a 75% match for 82.4% of sentences.

History

Email Address of Submitting Author

ava@zurich.ibm.com

Institution

IBM Research Europe

Country

Switzerland

ORCID For Submitting Author

0000-0001-7554-0288

Declaration of Conflict of Interest

No conflict of interest.

Exports