Abstract
Autoencoders represent a promising technique for the inverse quantitative structure-activity relationship (QSAR) task. However, undesirable bias, such as atom ordering, affects the neighbourhood behaviour of autoencoders’ latent space and, consequently, usage of the latent vectors as variables in machine-learning models. Here, we report a graph-based autoencoder which implements vector quantisation operation (VQGAE). The latter allows to learn vectorial representation of molecular fragments in an unsupervised manner. The latent vectors or fragment count vectors of VQGAE are permutation invariant and perform well in similarity ranking. In QSAR benchmarks, the VQGAE’s latent vectors outperform those derived by some earlier developed SMILES-based and graph-based autoencoders. Finally, VQGAE autoencoder was used in the inverse QSAR task in order to design new A2A adenosine receptor inhibitors.