Construction of order-independent molecular fragments space with vector quantised graph autoencoder

18 April 2023, Version 1
This content is a preprint and has not undergone peer review at the time of posting.


Autoencoders represent a promising technique for the inverse quantitative structure-activity relationship (QSAR) task. However, undesirable bias, such as atom ordering, affects the neighbourhood behaviour of autoencoders’ latent space and, consequently, usage of the latent vectors as variables in machine-learning models. Here, we report a graph-based autoencoder which implements vector quantisation operation (VQGAE). The latter allows to learn vectorial representation of molecular fragments in an unsupervised manner. The latent vectors or fragment count vectors of VQGAE are permutation invariant and perform well in similarity ranking. In QSAR benchmarks, the VQGAE’s latent vectors outperform those derived by some earlier developed SMILES-based and graph-based autoencoders. Finally, VQGAE autoencoder was used in the inverse QSAR task in order to design new A2A adenosine receptor inhibitors.


Inverse QSAR
Deep learning
Generative models
molecular graphs


Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.