Molecular set representation learning

Maria Boulougouri; Pierre Vandergheynst; Daniel Probst

doi:10.26434/chemrxiv-2023-fk7kf

Biological and Medicinal Chemistry

Search within Biological and Medicinal Chemistry

Molecular set representation learning

11 October 2023, Version 1

This is not the most recent version. There is a

newer version

of this content available

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Computational representation of molecules can take many forms, including graphs, string-encodings of graphs, binary vectors, or learned embeddings in the form of real-valued vectors. These representations are then used in downstream classification and regression tasks using a wide range of machine-learning models. However, existing models come with limitations, such as the requirement for clearly defined chemical bonds, which often do not represent the true underlying nature of a molecule. Here, we propose a framework for molecular machine learning tasks based on set representation learning. We show that learning on sets of atomic invariants alone reaches the performance of state-of-the-art graph-based models on the most-used chemical benchmark data sets and that introducing a set representation layer into graph neural networks can surpass the performance of established methods in the domains of chemistry, biology, and material science. We introduce specialised set representation-based neural network architectures for reaction yield and protein-ligand binding affinity prediction. Overall, we show that the technique we denote molecular set representation learning is both an alternative and an extension to graph neural network architectures for machine learning tasks on molecules, molecule complexes, and chemical reactions.

Keywords

graph neural networks

set representation learning

molecular set representation learning

ADME

reaction prediction

molecular property prediction

Supplementary weblinks

Title

Description

Actions

Title

GitHub repository

Description

The GitHub repository contains code, data, and examples related to the methods discussed in this manuscript.

Actions

View

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Version History

Mar 15, 2024 Version 2

Oct 11, 2023 Version 1

Metrics

1,867

1,173

Views

Downloads

Citations

License

The content is available under CC BY 4.0

DOI

10.26434/chemrxiv-2023-fk7kf

Funding

Swiss National Science Foundation

CRSII5_205884

Author’s competing interest statement

The author(s) have declared they have no conflict of interest with regard to this content

Ethics

The author(s) have declared ethics committee/IRB approval is not relevant to this content

Molecular set representation learning

Authors

Abstract

Keywords

Supplementary weblinks

Comments

Version History

Metrics

License

DOI

Funding

Author’s competing interest statement

Ethics

Share