Investigation of Effective Molecular Dynamics-derived Properties on Drug Solubility via Machine Learning

22 April 2025, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Solubility is an essential factor in drug discovery and development, as it plays a vital role in a medication's bioavailability and remedial efficacy. Understanding solubility at the initial stages of drug discovery is essential for reducing resource consumption and improving the probability of clinical success via prioritizing compounds with optimal solubility. Molecular dynamics (MD) simulation is a powerful tool for modeling a range of physicochemical properties, particularly solubility. MD simulations provide a detailed view of molecular interactions and dynamics, thus revealing insights into the factors influencing solubility. This research aims to statistically examine the importance of ten MD-derived properties along with LogP, one of the most influential experimental properties, on drug aqueous solubility through Machine Learning (ML) techniques. For this purpose, a dataset of 199 drugs from diverse classes were compiled from literatures and studied on MD simulation, and relevant properties extracted and selected as features. Additionally, corresponding octanol-water partition coefficients (Log P) from former studies were also imported and considered in this research. By explicit analysis, properties with the highest influence on solubility were identified. They used as input features for four machine learning ensemble algorithms, namely, Random Forest, Extra Trees, XGBoostRF, and Gradient Boosting. The results show that seven properties (Log P, SASA, Columbic_t, DGSolv, RMSD, AVG shell, and LJ) are highly effective in predicting solubility. For the test set, the best estimator algorithm, Gradient Boosting, a predictive R² = 0.87 and RMSE = 0.537 were achieved. This research underscores the potential of integrating MD simulations with machine learning methodologies to improve the accuracy and efficiency of aqueous solubility predictions in drug development.

Keywords

Drug Solubility
Molecular Dynamics
Machine Learning
Aqueous Solubility Prediction
Ensemble Algorithms
Feature Selection

Supplementary materials

Title
Description
Actions
Title
Supplementary Materials: Molecular Dynamics and Machine Learning Analysis of Drug Solubility.
Description
This supplementary information (SI) document complements the manuscript "Investigation of Effective Molecular Dynamics-derived Properties on Drug Solubility via Machine Learning''. It provides additional data, analyses, and visualizations to support the study’s findings. The SI includes: - A statistical summary and complete dataset of 199 drugs, covering solubility, LogP, and molecular dynamics-derived properties. - Figures illustrating variable relationships, distributions, and specific property analyses for a sample drug. - Details on machine learning model hyperparameters and feature selection results. - References for data sources. This SI enhances the transparency and reproducibility of the research for readers in computational biology, cheminformatics, and drug discovery.
Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.