Abstract
Drug registration requires risk assessment of new active pharmaceutical ingredients or excipients to ensure they are safe for human health and the environment. However, traditional risk assessment is generally expensive and relies heavily on in vivo testing. Stacking ensemble learning is the machine learning (ML) model that performed well in quantitative structure-toxicity relationship (QSTR) studies. In this study, we developed ToxSTK, a multi-target toxicity assessment using stacking ensemble learning. We aimed to create an ML tool that facilitates toxicity assessments more affordably with reduced reliance on animal models. We focused on four key targets generally assessed in early-stage drug development: cardiotoxicity, immunotoxicity, white blood cell toxicity, and mutagenicity. Our model integrated twelve molecular fingerprints with four ML algorithms, generating 36 novel predictive features (PF). These PFs were then combined to construct the final meta-decision model. Our results demonstrated that the ToxSTK model surpasses standard regression and classification metrics, ensuring it is highly reliable and accurate in predicting chemical toxicities within its application domain. This model passed the y-randomization test, confirming that the identified QSTR is robust and not due to random chance. Additionally, this model outperforms the existing ML methods for these endpoints, suggesting its effectiveness for risk assessment applications. We recommend incorporating this stacking ensemble learning framework into the chemical risk assessment pipeline to improve model generalization, accuracy, robustness, and reliability.