Can We Quickly Learn to “Translate” Bioactive Molecules with Transformer Models?

Emma Tysinger; Brajesh Rai; Anton Sinitskiy

doi:10.26434/chemrxiv-2022-gln27

Theoretical and Computational Chemistry

Search within Theoretical and Computational Chemistry

Can We Quickly Learn to “Translate” Bioactive Molecules with Transformer Models?

19 December 2022, Version 1

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Meaningful exploration of the chemical space of druglike molecules in drug design is a highly challenging task due to a combinatorial explosion of possible modifications of molecules. In this work, we address this problem with transformer models, a type of machine learning (ML) model, with recent demonstrated success in applications to machine translation and other tasks. By training transformer models on pairs of similar bioactive molecules from the public ChEMBL dataset, we enable them to learn medicinal-chemistry-meaningful, context-dependent transformations of molecules, including those absent from the training set. Most generated molecules are highly plausible and follow similar distributions of simple properties (molecular weight, polarity, hydrogen bond donor and acceptor numbers) as the training dataset. By retrospective analysis of the performance of transformer models on ChEMBL subsets of ligands binding to COX2, DRD2, or HERG protein targets, we demonstrate that the models can generate structures identical or highly similar to highly active ligands, despite the models having not seen any ligands active against the corresponding protein target during training. Thus, our work demonstrates that transformer models, originally developed to translate texts from one natural language to another, can be easily and quickly extended to “translations” from known molecules active against a given protein target to novel molecules active against the same target, and thereby contribute to hit expansion in drug design.

Keywords

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Now Published

Can We Quickly Learn to “Translate” Bioactive Molecules with Transformer Models?

Emma P. Tysinger, Brajesh K. Rai, Anton V. Sinitskiy journal article

Journal of Chemical Information and Modeling , Volume 63, Issue 6

Online publication date: Mar 13, 2023

Version History

Dec 19, 2022 Version 1

Metrics

954

547

Views

Downloads

Citations

License

The content is available under CC BY 4.0

DOI

10.26434/chemrxiv-2022-gln27

Author’s competing interest statement

The author(s) have declared they have no conflict of interest with regard to this content

Ethics

The author(s) have declared ethics committee/IRB approval is not relevant to this content

Can We Quickly Learn to “Translate” Bioactive Molecules with Transformer Models?

Authors

Abstract

Keywords

Comments

Now Published

Version History

Metrics

License

DOI

Author’s competing interest statement

Ethics

Share