Chemical Engineering and Industrial Chemistry

Intelligent Molecular Identification for High Performance Organosulfide Capture Using Active Machine Learning Algorithm

Authors

  • Yuxiang Chen East China University of Science and Technology ,
  • Chuanlei Liu East China University of Science and Technology ,
  • Yang An East China University of Science and Technology ,
  • Yue Lou East China University of Science and Technology ,
  • Yang Zhao East China University of Science and Technology ,
  • Cheng Qian East China University of Science and Technology ,
  • Hao Jiang East China University of Science and Technology ,
  • Kongguo Wu East China University of Science and Technology ,
  • Xianghui Zhang Washington State University ,
  • Hui Sun East China University of Science and Technology ,
  • Di Wu Washington State University ,
  • Benxian Shen East China University of Science and Technology ,
  • Fahai Cao East China University of Science and Technology

Abstract

Machine learning and computer-aided approaches significantly accelerate molecular design and discovery in scientific and industrial fields increasingly relying on data science for efficiency. The typical method used is supervised learning which needs huge datasets. Semi-supervised machine learning approaches are effective to train unlabeled data with improved modeling performance, whereas they are limited by the accumulation of prediction errors. Here, to screen solvents for removal of methyl mercaptan, a type of organosulfur impurities in natural gas, we constructed a computational framework by integrating molecular similarity search and active learning methods, namely, molecular active selection machine learning (MASML). This new model framework identifies the optimal molecules set by molecular similarity search and iterative addition to the training dataset. Among all 126,068 compounds in the initial dataset, 3 molecules were identified to be promising for methyl mercaptan (MeSH) capture, including benzylamine (BZA), p-methoxybenzylamine (PZM), and N,N-diethyltrimethylenediamine (DEAPA). Further experiments confirmed the effectiveness of our modeling framework in efficient molecular design and identification for capturing methyl mercaptan, in which DEAPA presents a Henry's law constant 89.4% lower than that of methyl diethanolamine (MDEA).

Content

Thumbnail image of Manuscript 11-25-2021.pdf

Supplementary material

Thumbnail image of Supporting Information 11-25-2021.pdf
Supporting Information - Intelligent Molecular Identification for High Performance Organosulfide Capture Using Active Machine Learning Algorithm
1. Methods 2. Tables 3. Figures 4. References