SMILES-Based Deep Generative Scaffold Decorator for De-Novo Drug Design

21 January 2020, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Molecular generative models trained with small sets of molecules represented as SMILES strings are able to generate large regions of the chemical space. Unfortunately, due to the sequential nature of SMILES strings, these models are not able to generate molecules given a scaffold (i.e. partially-built molecules with explicit attachment points). Herein we report a new SMILES-based molecular generative architecture that generates molecules from scaffolds and can be trained from any arbitrary molecular set. This is possible thanks to a new molecular set pre-processing algorithm that exhaustively cuts all combinations of acyclic bonds of every molecule, obtaining a large number of scaffold-decorations combinations. Moreover, it serves as a data augmentation technique and can be readily coupled with randomized SMILES to obtain even better results with small sets. Two examples showcasing the potential of the architecture in medicinal and synthetic chemistry are described: First, models were trained with a training set obtained from a small set of Dopamine Receptor D2 (DRD2) active modulators and were able to meaningfully decorate a wide range of scaffolds and obtain molecular series predicted active on DRD2. Second, a larger set of drug-like molecules from ChEMBL was selectively sliced using synthetic chemistry constraints (RECAP rules). Moreover, the resulting scaffold-decorations were filtered to only allow decorations that were fragment-like. This allowed models trained with this dataset to selectively decorate diverse scaffolds with fragments that were generally predicted to be synthesizable and attachable to the scaffold using known synthetic approaches. In both cases, the models were already able to decorate molecules using specific knowledge without the need to add it with other techniques, such as reinforcement learning. We envision that this architecture will become a useful addition to the already existent architectures for de-novo molecular generation.

Keywords

deep learning
generative models
SMILES strings
randomized SMILES
Recurrent Neural Networks
fragment-based ligand discovery
data augmentation
matched molecular pairs
ligand series

Supplementary materials

Title
Description
Actions
Title
smiles based scaffold decorator additional methods
Description
Actions
Title
smiles based scaffold decorator additional figures
Description
Actions

Supplementary weblinks

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.