Abstract
The accuracy of computational models of water is key to atomistic simulations of biomolecules. Here we explore a decoupled framework that combines classical physics- based models with deep neural networks (DNNs) to correct residual error in hydration free energy (HFE) prediction. Our main goal is to evaluate this framework on out-of- distribution data (molecules that differ significantly from those used in training), where DNNs are known to struggle. Several common physics-based solvation models are used in the evaluation. Graph neural network architectures are tested for their ability to generalize using multiple dataset splits, including out-of-distribution HFEs and unseen molecular scaffolds. Our most important finding is that for out-of-distribution data, where DNNs alone often struggle, the physics + DNN models consistently improve physics model predictions. For in-distribution data, the DNN corrections significantly improve the accuracy of physics-based models, with a final RMSE below 1 kcal/mol and a relative improvement between 40% and 65% in most cases. The accuracy of physics + DNN models tends to improve when the 6% of molecules with the highest experimental uncertainty are removed. This study provides insights into the potential and limitations of combining physics and machine learning for molecular modeling, offering a practical and generalizable strategy.
Supplementary materials
Title
Supplementary Materials
Description
Details of the physics-based models of solvation; additional tables and figures; access to open source code.
Actions