Working Paper
Authors
- Jiangcheng Xu Zhejiang University of Technology & Hangzhou Vocational and Technical College ,
- Yun Zhang Zhejiang University of Technology ,
- Jiale Han Zhejiang University of Technology ,
- Haoran Qiao Shanghai University of Electric Power ,
- Chengyun zhang Zhejiang University of Technology ,
- Jing Tang Zhejiang University of Technology ,
- Shen Xi Zhejiang University of Technology ,
- Bin Sun Zhejiang University of Technology ,
- Silong Zhai Zhejiang University of Technology ,
- Xinqiao Wang Zhejiang University of Technology ,
- Yejian Wu Zhejiang University of Technology ,
- Weike Su Zhejiang University of Technology ,
- Hongliang Duan
Zhejiang University of Technology
Abstract
Predicting and proposing the reaction mechanism, as well as speculating the reaction intermediates are great challenges among the development of modern organic chemistry. Herein, a model from Natural Language Processing (NLP) was firstly employed to learn and perform the task of intermediate prediction, which is served as a language translation task. Radical cascade cyclization is prevalently used in life science and pharmaceutical projects, while the regioselectivity of radical attack is difficult to predict. The model is trained on self-built dataset to tackle the challenge. And transfer learning was used to surmount the restriction of limited amounts of data. The NLP transformer model performs well with remarkable accuracy, providing an efficient instruction for mechanism understanding. Manual encoding of rules is not required, thus, providing a favorable tool towards solving the challenging problem of computational organic chemical mechanism inference.
Content

Supplementary material

New Application of Natural Language Processing(NLP)for Chemist: Predicting Intermediate and Providing an Effective Direction for Mechanism Inference
This is support information for: New Application of Natural Language Processing(NLP)for Chemist: Predicting Intermediate and Providing an Effective Direction for Mechanism Inference.