Abstract
Accurate prediction of aqueous solubility remains a critical challenge in the chemical and pharmaceutical industries, significantly influencing drug development and delivery. This study revisits this well-explored area by leveraging the advanced capabilities of modern computational resources. We apply an automated network optimizer model that integrates dual optimization processes for molecular features and hyperparameters, streamlining the traditionally complex hyperparameter search while providing an efficient interpretation of molecular properties. By employing feature optimization techniques, our deep neural network model demonstrates improvements in both the speed and accuracy of molecular property predictions, achieving an average performance of R2 = 0.991. This result outperforms conventional hyperparameter optimization methods such as grid search and random search in predicting the intrinsic solubility of 3,745 compounds across four external experimental datasets. Based on feature importance analysis, we identified key molecular features and structures that significantly influence solubility. Additionally, combining three molecular fingerprints (Morgan, MACCS key, and Avalon) with molecular descriptors enhances model performance, providing a deeper understanding of the relationship between molecular structure and solubility within the physicochemical feature optimization process. These findings underscore the potential of machine learning models to improve predictive modeling of physical properties, apply automated modeling and feature selection to new chemical datasets, and offer explainable insights into the principles driving solubility predictions.
Supplementary materials
Title
Prediction of intrinsic solubility for drug-like organic compounds using Automated Network Optimizer (ANO) for physicochemical feature and hyperparameter optimization
Description
The supplementary materials include essential additional information to support the main text, comprising one table, six figures, and an appendix that details the machine learning methodology employed in this study.
Actions