Struct2IUPAC -- Transformer-Based Artificial Neural Network for the Conversion Between Chemical Notations

Lev Krasnov; Ivan Khokhlov; Maxim Fedorov; Sergey Sosnin

doi:10.26434/chemrxiv.13274732.v2

Theoretical and Computational Chemistry

Search within Theoretical and Computational Chemistry

Struct2IUPAC -- Transformer-Based Artificial Neural Network for the Conversion Between Chemical Notations

12 January 2021, Version 2

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Providing IUPAC chemical names is necessary for chemical information exchange. We developed a Transformer-based artificial neural architecture to translate between SMILES and IUPAC chemical notations: Struct2IUPAC and IUPAC2Struct. Our models demonstrated the performance that is comparable to rule-based solutions. We proved that both accuracy, speed of computations, and the model's robustness allow us to use it in production. Our showcase demonstrates that a neural-based solution can encourage rapid development keeping the same performance. We believe that our findings will inspire other developers to reduce development costs by replacing complex rule-based solutions with neural-based ones. The demonstration of Struct2IUPAC model is available online on Syntelly platform https://app.syntelly.com/smiles2iupac

Keywords

machine learning

chemical nomenclature

Artificial Intelligence

Transformer

IUPAC Nomenclature

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Version History

Jan 12, 2021 Version 2

Nov 24, 2020 Version 1

Version Notes

version 2.0 In this version, we have made corrections and improvements. We added Table 1 with the description of models’ accuracy for various beam sizes. We prepared the distribution of the number of name variations (Figure 8) generated by Transformer. We fixed a bug that led to a non-uniform distribution of 100 000 selected molecules from our test set. We prepared a new 100k subset with the uniform distribution. We recalculated the performance on the new 100k dataset for direct and reverse models (Table 1) and the dependence between model accuracy and the length of SMILES (Figure 4) for the uniform test set. Also, we have redrawn Figure 5 to follow the distribution of the new 100k dataset. One can see that the performance on the new test set stays very high (although not absolute) and comparable to algorithmic-based solutions.

Metrics

6,666

981

Views

Downloads

License

The content is available under CC BY NC ND 4.0

DOI

10.26434/chemrxiv.13274732.v2

Author’s competing interest statement

Maxim Fedorov and Sergey Sosnin are co-founders of Syntelly LLC. Lev Krasnov and Ivan Khokhlov are employees of Syntelly LLC

Struct2IUPAC -- Transformer-Based Artificial Neural Network for the Conversion Between Chemical Notations

Authors

Abstract

Keywords

Comments

Version History

Version Notes

Metrics

License

DOI

Author’s competing interest statement

Share