KGG: Knowledge-Guided Graph Self-Supervised Learning to Enhance Molecular Property Predictions

25 April 2025, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Molecular property prediction has become essential in accelerating advancements in drug discovery and materials science. Graph Neural Networks have recently demonstrated remarkable success in molecular representation learning; however, their broader adoption is impeded by two significant challenges: (1) data scarcity and constrained model generalization due to the expensive and timeconsuming task of acquiring labeled data, and (2) inadequate initial node and edge features that fail to incorporate comprehensive chemical domain knowledge, notably orbital information. To address these limitations, we introduce a Knowledge-Guided Graph (KGG) framework employing self-supervised learning to pre-train models using orbital-level features in order to mitigate reliance on extensive labeled datasets. In addition, we propose novel representations for atomic hybridization and bond types that explicitly consider orbital engagement. Our pre-training strategy is cost-efficient, utilizing approximately 250,000 molecules from the ZINC15 dataset, in contrast to contemporary approaches that typically require between two and ten million molecules, consequently reducing the risk of potential data contamination. Extensive evaluations on diverse downstream molecular property datasets demonstrate that our method significantly outperforms state-of-the-art baselines. Complementary analyses, including t-SNE visualizations and comparisons with traditional molecular fingerprints, further validate the effectiveness and robustness of our proposed KGG approach.

Keywords

Drug discovery
graph neural networks
knowledge graph
self-supervised learning
orbital information

Supplementary materials

Title
Description
Actions
Title
KGG: Knowledge-Guided Graph Self-Supervised Learning to Enhance Molecular Property Predictions
Description
Knowledge-Guided Graph (KGG) introduces orbital-aware atomic-hybridization and bond-type encodings, enabling costefficient self-supervised pre-training. Our framework mitigates reliance on extensive labeled datasets and reduces the risk of potential data contamination. Comprehensive benchmarks across diverse molecular-property tasks on MoleculeNet public datasets, alongside t-SNE visualization and comparisons with traditional fingerprints, confirm that KGG consistently surpasses contemporary self-supervised learning baselines in effectiveness and robustness.
Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.