These are preliminary reports that have not been peer-reviewed. They should not be regarded as conclusive, guide clinical practice/health-related behavior, or be reported in news media as established information. For more information, please see our FAQs.
SI.pdf (86.49 kB)
Automated Extraction of Chemical Synthesis Actions from Experimental Procedures
Preprints are manuscripts made publicly available before they have been submitted for formal peer review and publication. They might contain new research findings or data. Preprints can be a draft or final version of an author's research but must not have been accepted for publication at the time of submission.
Experimental procedures for chemical synthesis are commonly reported in prose in patents or in the scientific literature. The automatic extraction of the details necessary to reproduce and validate a synthesis in a chemical laboratory is quite often a tedious task, requiring extensive human intervention. Here, we present a method to convert unstructured experimental procedures written in English to structured synthetic steps (action sequences) reflecting all the steps needed to successfully conduct the corresponding chemical reactions. To achieve this, we design a set of synthesis actions with predefined properties and a deep-learning transformer-based model to convert experimental procedures to a textual representation of action sequences. The model is first pretrained on vast amounts of data generated automatically with a custom rule-based natural language processing approach, and then refined on a smaller set of manually annotated samples. Predictions on our test set resulted in a perfect match of the full action sequence for 64.5% of sentences.