ChemRxiv
These are preliminary reports that have not been peer-reviewed. They should not be regarded as conclusive, guide clinical practice/health-related behavior, or be reported in news media as established information. For more information, please see our FAQs.
STOUT_V2.pdf (1011.42 kB)

STOUT: SMILES to IUPAC Names Using Neural Machine Translation

preprint
revised on 22.03.2021, 10:25 and posted on 23.03.2021, 10:51 by Kohulan Rajan, Achim Zielesny, Christoph Steinbeck

Chemical compounds can be identified through a graphical depiction, a suitable string representation, or a chemical name. A universally accepted naming scheme for chemistry was established by the International Union of Pure and Applied Chemistry (IUPAC) based on a set of rules. Due to the complexity of this rule set a correct chemical name assignment remains challenging for human beings and there are only a few rule-based cheminformatics toolkits available that support this task in an automated manner.

Here we present STOUT (SMILES-TO-IUPAC-name translator), a deep-learning neural machine translation approach to generate the IUPAC name for a given molecule from its SMILES string as well as the reverse translation, i.e., predicting the SMILES string from the IUPAC name. The open system demonstrates a test accuracy of about 90% correct predictions, also incorrect predictions show a remarkable similarity between true and predicted compounds.

History

Email Address of Submitting Author

kohulan.rajan@uni-jena.de

Institution

Friedrich-Schiller-University Jena

Country

Germany

ORCID For Submitting Author

0000-0003-1066-7792

Declaration of Conflict of Interest

No Conflict of Interest

Version Notes

STOUT Revision 2

Exports