How Important Are GNN Architectures For QSAR Modelling?

Natasha Sanjrani; Damien Coupry

doi:10.26434/chemrxiv-2025-k1xlg

Abstract

This work investigates Graph Neural Networks and molecular descriptors average performance across a range of realistic QSAR problems. Using 9 GSK internal QSAR datasets, 9 Graph Neural Network layer architectures and their hyperparameters were evaluated on a default set of descriptors. Following this, 5 descriptor classes were evaluated across datasets using a common baseline model architecture. We show that no architecture performed better than any other, and that hyperparameters such as the learning rate, dropout, and message-passing layers are crucial for performance. We likewise show that the choice of molecular descriptors is impactful across datasets. Finally, the impact of tautomers on descriptors showed consistent deviations for expected descriptors, such as the number of hydrogens and graph centrality measures. Our recommendations are to direct modelling efforts towards hyperparameter optimization and feature selection rather than focusing on network architecture.

Keywords

Supplementary materials

Title

Description

Actions

Title

Supplementary Information

Description

Details of descriptors studied, statistical tests, and information and analysis on hyperparameters optimisation runs

Actions

Supplementary weblinks

Title

Description

Actions

Title

GNN-QSAR GitHub

Description

The GitHub repository containing code and data mentioned in the study

Actions

View

How Important Are GNN Architectures For QSAR Modelling?

Abstract

Keywords

Supplementary materials

Supplementary weblinks

Comments

Version History

Metrics

License

DOI

Author’s competing interest statement

Ethics

Share

How Important Are GNN Architectures For QSAR Modelling?

Authors

Abstract

Keywords

Supplementary materials

Supplementary weblinks

Comments

Version History

Metrics

License

DOI

Author’s competing interest statement

Ethics

Share