Theoretical and Computational Chemistry

Transmol: Repurposing Language Model for Molecular Generation



Recent advances in convolutional neural networks have inspired the application of deep learning to other disciplines. Even though image processing and natural language processing have turned out to be the most successful, there are many other areas that have benefited, like computational chemistry in general and drug design in particular. From 2018 the scientific community has seen a surge of methodologies related to the generation of diverse molecular libraries using machine learning. However, no algorithm used an attention mechanisms for de novo molecular generation. Here we employ a variant of transformers, a recent NLP architecture, for this purpose. We have achieved a statistically significant increase in some of the core metrics of the MOSES benchmark. Furthermore, a novel way of generating libraries fusing two molecules as seeds has been described.


Thumbnail image of method_manuscript (2).pdf