Transformer-CNN: Fast and Reliable Tool for QSAR

Pavel Karpov; Guillaume Godin; Igor Tetko

doi:10.26434/chemrxiv.9961787.v1

We present SMILES-embeddings derived from internal encoder state of a Transformer model trained to canonize SMILES as a Seq2Seq problem. Using CharNN architecture upon the embeddings results in a higher quality QSAR/QSPR models on diverse benchmark datasets including regression and classification tasks. The proposed Transformer-CNN method uses SMILES augmentation for training and inference, and thus the prognosis grounds on an internal consensus. Both the augmentation and transfer learning based on embedding allows the method to provide good results for small datasets. We discuss the reasons for such effectiveness and draft future directions for the development of the method. The source code and the embeddings are available on https://github.com/bigchem/transformer-cnn, whereas the OCHEM environment (https://ochem.eu) hosts its on-line implementation.

Transformer-CNN: Fast and Reliable Tool for QSAR

Abstract

Keywords

Comments

Version History

Metrics

License

DOI

Author’s competing interest statement

Share

Transformer-CNN: Fast and Reliable Tool for QSAR

Authors

Abstract

Keywords

Comments

Version History

Metrics

License

DOI

Author’s competing interest statement

Share