Human-Readable SMILES: Translating Cheminformatics to Chemistry

Authors

Abstract

Molecular string representations are a key asset in cheminformatics and are becoming increasingly relevant to the general chemical community, due to the steadily growing impact of Big Data and Machine Learning. Among all of the existing string representations that have been proposed, SMILES (Simplified Molecular Input Line Entry Specification) are probably the de facto standard as of today. Despite their convenience as a way to store unique molecular structures in data-bases, however, SMILES are not easy to understand for most chemists: that is, it is difficult for an untrained chemist to grasp the molecule that a SMILES is describing.

To mitigate this, we propose the HumanSMILES algorithm: a simple pro-cedure that can translate a SMILES string into a more interpretable name, inspired by common abbreviations and names employed in general organic chemistry. The Human-Readable SMILES can describe linear structures and general non-fused cyclic structures, with a set of naming rules that combines automated processing and chemical knowledge. The code is available open-source, as well as a web application.

Content

Supplementary material

HumanSMILES SI