Abstract
Hydration free energy (HFE) is a fundamental thermodynamic property with broad relevance in both chemistry and biology, particularly in solvation processes. Traditional methods for computing HFE, such as molecular dynamics simulations, are often computationally intensive and require significant domain-specific calibration. Recent advances in machine learning (ML) have enabled more efficient HFE predictions, especially for small molecules. However, many existing ML models lack interpretability and often rely on large, opaque feature sets. In this study, a cross-attention-based graph neural network (GNN) is used for predicting the HFE of small molecules from the FreeSolv dataset. Our model integrates graph representations of solute and solvent molecules and captures their mutual interactions through a cross-attention mechanism during message passing. To enhance physical interpretability, we incorporate a compact set of six global molecular descriptors: approximate electrostatic energy computed via a closed-form Generalized Born (GB) model, polar surface area, logarithm of the octanol–water partition coefficient (log P), hydrogen bond donors, hydrogen bond acceptors, and the number of rotatable bonds. We benchmark our model against classical machine learning methods and recent GNN-based baselines. Our attention-based GNN not only improves prediction accuracy but also maintains transparency in feature importance. Our method outperforms existing baselines, achieving a mean absolute error (MAE) of 0.54 ± 0.04 kcal/mol and a root mean square error (RMSE) of 0.75 ± 0.03 kcal/mol, which is approximately 23% and 36% improvement as compared to the bestperforming baseline, respectively. The ablation study reveals that among the global descriptors used for the solute, electrostatic energy and polar surface area are the most critical in reducing prediction error, followed by features related to hydrogen bonding. This combination of high accuracy and strong interpretability makes our framework well-suited for large-scale, data-driven investigations of solvation free energies.
Supplementary materials
Title
Supporting Information: Hydration Free Energies for Small Molecules with Physics-based Descriptors: Graph Neural Network with Cross Attention
Description
Additional information regarding molecular graph features, computational setup, hyperparameter configurations, and implementation specifics is provided in the Supporting Information (SI). The SI also includes kernel density plots for the six global descriptors used in this study
Actions