Hydration Free Energies for Small Molecules with Physics-based Descriptors: Graph Neural Network with Cross Attention

16 June 2025, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Hydration free energy (HFE) is a fundamental thermodynamic property with broad relevance in both chemistry and biology, particularly in solvation processes. Traditional methods for computing HFE, such as molecular dynamics simulations, are often computationally intensive and require significant domain-specific calibration. Recent advances in machine learning (ML) have enabled more efficient HFE predictions, especially for small molecules. However, many existing ML models lack interpretability and often rely on large, opaque feature sets. In this study, a cross-attention-based graph neural network (GNN) is used for predicting the HFE of small molecules from the FreeSolv dataset. Our model integrates graph representations of solute and solvent molecules and captures their mutual interactions through a cross-attention mechanism during message passing. To enhance physical interpretability, we incorporate a compact set of six global molecular descriptors: approximate electrostatic energy computed via a closed-form Generalized Born (GB) model, polar surface area, logarithm of the octanol–water partition coefficient (log P), hydrogen bond donors, hydrogen bond acceptors, and the number of rotatable bonds. We benchmark our model against classical machine learning methods and recent GNN-based baselines. Our attention-based GNN not only improves prediction accuracy but also maintains transparency in feature importance. Our method outperforms existing baselines, achieving a mean absolute error (MAE) of 0.54 ± 0.04 kcal/mol and a root mean square error (RMSE) of 0.75 ± 0.03 kcal/mol, which is approximately 23% and 36% improvement as compared to the bestperforming baseline, respectively. The ablation study reveals that among the global descriptors used for the solute, electrostatic energy and polar surface area are the most critical in reducing prediction error, followed by features related to hydrogen bonding. This combination of high accuracy and strong interpretability makes our framework well-suited for large-scale, data-driven investigations of solvation free energies.

Keywords

Hydration Free Energy
Machine Learning

Supplementary materials

Title
Description
Actions
Title
Supporting Information: Hydration Free Energies for Small Molecules with Physics-based Descriptors: Graph Neural Network with Cross Attention
Description
Additional information regarding molecular graph features, computational setup, hyperparameter configurations, and implementation specifics is provided in the Supporting Information (SI). The SI also includes kernel density plots for the six global descriptors used in this study
Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.