Transformer-CNN: Fast and Reliable Tool for QSAR

Pavel Karpov; Guillaume Godin; Igor Tetko

doi:10.26434/chemrxiv.9961787.v1

Theoretical and Computational Chemistry

Search within Theoretical and Computational Chemistry

Transformer-CNN: Fast and Reliable Tool for QSAR

21 October 2019, Version 1

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

We present SMILES-embeddings derived from internal encoder state of a Transformer model trained to canonize SMILES as a Seq2Seq problem. Using CharNN architecture upon the embeddings results in a higher quality QSAR/QSPR models on diverse benchmark datasets including regression and classification tasks. The proposed Transformer-CNN method uses SMILES augmentation for training and inference, and thus the prognosis grounds on an internal consensus. Both the augmentation and transfer learning based on embedding allows the method to provide good results for small datasets. We discuss the reasons for such effectiveness and draft future directions for the development of the method. The source code and the embeddings are available on https://github.com/bigchem/transformer-cnn, whereas the OCHEM environment (https://ochem.eu) hosts its on-line implementation.

Keywords

Transformer model

Convolutional neural networks

Character-based models

Cheminformatics

Regression

Classification

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Now Published

Transformer-CNN: Swiss knife for QSAR modeling and interpretation

Pavel Karpov, Guillaume Godin, Igor V. Tetko journal article

Journal of Cheminformatics , Volume 12, Issue 1

Online publication date: Mar 18, 2020

Version History

Oct 21, 2019 Version 1

Metrics

4,698

957

Views

Downloads

License

The content is available under CC BY NC ND 4.0

DOI

10.26434/chemrxiv.9961787.v1

Author’s competing interest statement

No conflict of interest.

Transformer-CNN: Fast and Reliable Tool for QSAR

Authors

Abstract

Keywords

Comments

Now Published

Version History

Metrics

License

DOI

Author’s competing interest statement

Share