Theoretical and Computational Chemistry

Neural Language Modeling for Molecule Generation

Sanjar Adilov Romanovsky Institute of Mathematics


Generative neural networks have shown promising results in de novo drug design. Recent studies suggest that one of the efficient ways to produce novel molecules matching target properties is to model SMILES sequences using deep learning in a way similar to language modeling in natural language processing. In this paper, we present a survey of various machine learning methods for SMILES-based language modeling and propose our benchmarking results on a standardized subset of ChEMBL database.

Version notes

Version 1, Moleculegen-ML-1.1.0, RNN baseline.


Thumbnail image of Sanjar_Adilov_-_Neural_Language_Modeling_for_Molecule_Generation.pdf
download asset Sanjar_Adilov_-_Neural_Language_Modeling_for_Molecule_Generation.pdf 0.52 MB [opens in a new tab]

Supplementary weblinks