Abstract
This study, focusing on predicting Absorption, Distribution, Metabolism, Excretion, and Toxicology (ADME(T)) properties, adresses the key challenges of ML models trained using ligand-based representations. We propose a structured approach to data feature selection, taking a step beyond the conventional practice of combining different representations without systematic reasoning. Additionally, we enhance model evaluation methods by integrating cross-validation with statistical hypothesis testing, adding a layer of reliability to the model assessments. This approach aims to bolster the reliability of ADME(T) predictions, providing more dependable and informative model evaluations.