Theoretical and Computational Chemistry

Chemformer: A Pre-Trained Transformer for Computational Chemistry

Authors

Abstract

Transformer models coupled with Simplified Molecular Line Entry System (SMILES) have recently proven to be a powerful combination for solving challenges in cheminformatics. These models, however, are often developed specifically for a single application and can be very resource-intensive to train. In this work we present Chemformer model – a Transformerbased model which can be quickly applied to both sequence-to-sequence and discriminative cheminformatics tasks. Additionally, we show that self-supervised pre-training can improve performance and significantly speed up convergence on downstream tasks. On direct synthesis and retrosynthesis prediction benchmark datasets we publish state-of-the-art results for top- 1 accuracy. We also improve on existing approaches for a molecular optimisation task and show that Chemformer can optimise on multiple discriminative tasks simultaneously. Models, datasets and code will be made available after publication.

Content

Thumbnail image of Irwin2021 - Chemformer - A Pre-Trained Transformer for Computational Chemistry.pdf

Supplementary material

Thumbnail image of Irwin2021 - Chemformer - A Pre-Trained Transformer for Computational Chemistry - Supplementary Information.pdf
Supplemetary Informartion for Chemformer: A Pre-Trained Transformer for Computational Chemistry
Supplementary Tables and Results

Supplementary weblinks

https://github.com/MolecularAI/Chemformer
Source code for Chemformer