Abstract
For high-throughput catalytic material discovery, Graph Neural Networks (GNNs) provide an efficient method for predicting the adsorption energies of adsorbates on transition metal surfaces. While GNNs perform well on in-domain prediction tasks, they often struggle to generalize to out-of-domain scenarios. This limitation necessitates a robust method for quantifying prediction uncertainty to enable informed catalyst discovery.
Gaussian Processes (GPs) offer a principled approach to uncertainty quantification within a Bayesian framework. However, traditional implementations suffer from key challenges, including cubic time complexity, high memory requirements, and an inability to learn meaningful representations from graph structures.
To address these issues, we introduce Deep Graph Kernel Learning (DGKL), a scalable framework that integrates a GNN backbone with sparse variational Gaussian Processes (SVGP) for uncertainty quantification in adsorption energy prediction. We benchmark DGKL against state-of-the-art methods such as ensemble/query-by-committee and Monte Carlo dropout, using both ranking-based metrics (e.g., Spearman's rank correlation, negative log-likelihood, miscalibration area) and error-based metrics (e.g., RMSE vs. RMV and error vs. standard deviation plots).
DGKL consistently outperforms existing methods across all evaluation metrics while maintaining computational efficiency and scalability. For example, the correlation coefficient between RMSE and RMV for DGKL ranges from 0.98 to 1.00, slightly exceeding the next best method (ensemble learning). More significantly, the expected normalized calibration error (ENCE) for DGKL ranges from 0.06 to 0.15 across different datasets and GNN backbones, while the ensemble method exhibits a wider range of 0.36 to 1.55.
DGKL can be incorporated into an active learning framework to iteratively explore catalytic material space, guiding the discovery of novel active catalysts. Additionally, we propose a variation of DGKL capable of predicting atomic-level uncertainty, a feature absent in existing methods. This enables fine-grained insights into out-of-domain data and provides a pathway for enhancing predictive model performance.