ChemRxiv
These are preliminary reports that have not been peer-reviewed. They should not be regarded as conclusive, guide clinical practice/health-related behavior, or be reported in news media as established information. For more information, please see our FAQs.
1/1
2 files

Img2Mol - Accurate SMILES Recognition from Molecular Graphical Depictions

preprint
submitted on 26.03.2021, 16:45 and posted on 29.03.2021, 10:14 by Djork-Arné Clevert, Tuan Le, Robin Winter, Floriane Montanari

Automatic recognition of the molecular content of a molecule’s graphical depiction is an extremely challenging problem that remains largely unsolved despite decades of research. Recent advances in neural machine translation enable the auto-encoding of molecular structures in a continuous vector space of fixed size (latent representation) with low reconstruction errors. In this paper, we present a fast and accurate model combining a deep convolutional neural network learning from molecule depictions and a pre-trained decoder that translates the latent representation into the SMILES representation of the molecules. This combination allows to precisely infer a molecular structure from an image. Our rigorous evaluation show that Img2Mol is able to correctly translate up to 88% of the molecular depictions into their SMILES representation. A pretrained version of Img2Mol is made publicly available on GitHub for non-commercial users.

History

Email Address of Submitting Author

djork-arne.clevert@bayer.com

Institution

Bayer AG

Country

Deutschland

ORCID For Submitting Author

0000-0003-4191-2156

Declaration of Conflict of Interest

No conflict of interest

Version Notes

Version 1.0

Exports

ChemRxiv

Exports