Theoretical and Computational Chemistry

PySMILESUtils – Enabling deep learning with the SMILES chemical language



Recent years have seen a large interest in using the Simplified Molecular Input Line Entry System (SMILES) chemical language as input for deep learning architectures solving chemical tasks. Many successful applications have been demonstrated within de novo molecular design, quantitative structure-activity relationship modelling, forward reaction prediction and single-step retrosynthetic planning as examples. PySMILESUtils aims to enable these tasks by providing readyto- use and adaptable Python classes for tokenization, augmentation, dataset, and dataloader creation. Classes for handling datasets larger than memory and speeding up training by minimizing padding are also provided. The framework subclasses PyTorch dataset and dataloaders but should be adaptable for other deep learning frameworks. The project is open-sourced with a permissive license and made available at GitHub:


Thumbnail image of Bjerrum2021 - PySMILESUtils – Enabling deep learning with the SMILES chemical language.pdf

Supplementary weblinks

PySMILESUtils code
The github repository for the PySMILESUtils package.