Abstract
The solvation free energy is a fundamental property of a solute directly related to solubility, which in turn is critical for processes ranging from pharmaceutical to materials manufacturing. We seek to develop efficient strategies to predict the solvation free energy knowing only molecular structure using electronic structure calculations with the uESE continuum solvation model for the potential use as accurate descriptors in physics-informed machine learning models to predict solubility. Benchmarking on the Minnesota Solvation Database, single conformations generated using the molecular mechanics force field MMFF94 yielded predictive accuracy comparable to reference gas phase optimized geometries obtained with electronic structure calculations. Surprisingly, exploring multiple conformations did not consistently improve predictions, suggesting uESE performs well with a single representative input. Evaluation on the independent dGsolvDB1 dataset demonstrated reasonable predictive ability with single molecular mechanics-generated conformations and some generalizability to novel chemical space. Our findings indicate that combining fast molecular mechanics-based structure generation with uESE offers a promising approach for efficient and reasonably accurate solvation free energy predictions, supporting its utility in high-throughput screening and machine learning for solubility prediction.
Supplementary materials
Title
Supporting Information 1
Description
Supporting Information 1 (SI-1) contains a discussion of the error versus the number of non-hydrogen (intramolecular) bonds and atoms as a measure of the size of the solute, images of chemical structures of interest, and a note on the use of the average absolute percent deviation (AAPD).
Actions
Title
Supporting Information 2
Description
Supporting Information 2 (SI-2) contains spreadsheets wherein we tabulate all of the predictions and descriptors used to characterize the solute molecules, and a summary of the error broken down by solvent (including non-water) and uESE solvent class. We additionally provide a summary of the error based on the solute descriptor (torsional angles, atoms and bonds, hydrogen bond donor sites, and hydrogen bond acceptor sites). All of the errors based on the solute descriptor are provided overall, when water is the solvent, and non-water solvents.
Actions