ChemRxiv
These are preliminary reports that have not been peer-reviewed. They should not be regarded as conclusive, guide clinical practice/health-related behavior, or be reported in news media as established information. For more information, please see our FAQs.
Identifying SPR through SMILES syntax analysis with self-attention mechanism.pdf (1.02 MB)
0/0

Identifying Structure-Property Relationships through SMILES Syntax Analysis With Self-Attention Mechanism

preprint
revised on 14.11.2018 and posted on 14.11.2018 by Shuangjia Zheng, Xin Yan, Yuedong Yang, Jun Xu

Recognizing substructures and their relations embedded in a molecular structure representation is a key process for structure-activity or structure-property relationship (SAR/SPR) studies. A molecular structure can be either explicitly represented as a connection table (CT) or linear notation, such as SMILES, which is a language describing the connectivity of atoms in the molecular structure. Conventional SAR/SPR approaches rely on partitioning the CT into a set of predefined substructures as structural descriptors. In this work, we propose a new method to identifying SAR/SPR through linear notation (for example, SMILES) syntax analysis with self-attention mechanism, an interpretable deep learning architecture. The method has been evaluated by predicting chemical property, toxicology, and bioactivity from experimental data sets. Our results demonstrate that the method yields superior performance comparing with state-of-the-art methods. Moreover, the method can produce chemically interpretable results, which can be used for a chemist to design, and synthesize the activity/property improved compounds.

Funding

national science & technology major project of the ministry of science and technology of China (2018ZX09735010), GD Frontier & Key Techn. Innovation Program (2015B010109004), GD-NSF (2016A030310228), Natural Science Foundation of China (U1611261) and the program for Guangdong Introducing Innovative and Enterpreneurial Teams (2016ZT06D211)

History

Email Address of Submitting Author

zhengshj9@mail2.sysu.edu.cn

Institution

Research Center for Drug Discovery, School of Pharmaceutical Sciences, Sun Yat-sen University

Country

China

ORCID For Submitting Author

0000-0001-9747-4285

Declaration of Conflict of Interest

The authors declare that they have no competing interests.

Version Notes

fixed typographic mistakes

Exports