ChemRxiv
These are preliminary reports that have not been peer-reviewed. They should not be regarded as conclusive, guide clinical practice/health-related behavior, or be reported in news media as established information. For more information, please see our FAQs.
1/1
2 files

Human-Readable SMILES: Translating Cheminformatics to Chemistry

preprint
submitted on 18.03.2021, 08:28 and posted on 19.03.2021, 05:16 by Diego Garay-Ruiz, Carles Bo
Molecular string representations are a key asset in cheminformatics and are becoming increasingly relevant to the general chemical community, due to the steadily growing impact of Big Data and Machine Learning. Among all of the existing string representations that have been proposed, SMILES (Simplified Molecular Input Line Entry Specification) are probably the de facto standard as of today. Despite their convenience as a way to store unique molecular structures in data-bases, however, SMILES are not easy to understand for most chemists: that is, it is difficult for an untrained chemist to grasp the molecule that a SMILES is describing.

To mitigate this, we propose the HumanSMILES algorithm: a simple pro-cedure that can translate a SMILES string into a more interpretable name, inspired by common abbreviations and names employed in general organic chemistry. The Human-Readable SMILES can describe linear structures and general non-fused cyclic structures, with a set of naming rules that combines automated processing and chemical knowledge. The code is available open-source, as well as a web application.

Funding

CTQ2017-88777-R

2017SGR00290

History

Email Address of Submitting Author

dgaray@iciq.es

Institution

Institute of Chemical Research of Catalonia

Country

Spain

ORCID For Submitting Author

0000-0003-0744-0562

Declaration of Conflict of Interest

No conflict of interest

Exports

ChemRxiv

Exports