These are preliminary reports that have not been peer-reviewed. They should not be regarded as conclusive, guide clinical practice/health-related behavior, or be reported in news media as established information. For more information, please see our FAQs.
method_manuscript (2).pdf (1.15 MB)

Transmol: Repurposing Language Model for Molecular Generation

submitted on 01.04.2021, 08:40 and posted on 02.04.2021, 04:53 by Rustam Zhumagambetov, Vsevolod A. Peshkov, Siamac Fazli
Recent advances in convolutional neural networks have inspired the application of deep learning to other disciplines. Even though image processing and natural language processing have turned out to be the most successful, there are many other areas that have benefited, like computational chemistry in general and drug design in particular. From 2018 the scientific community has seen a surge of methodologies related to the generation of diverse molecular libraries using machine learning. However, no algorithm used an attention mechanisms for de novo molecular generation. Here we employ a variant of transformers, a recent NLP architecture, for this purpose. We have achieved a statistically significant increase in some of the core metrics of the MOSES benchmark. Furthermore, a novel way of generating libraries fusing two molecules as seeds has been described.


Email Address of Submitting Author


Nazarbayev University



ORCID For Submitting Author


Declaration of Conflict of Interest

no conflict of interest