AI-Driven QSAR and SHAP Analysis of a Scaffold-Focused, Novel Triazole–Naphthalene Library Designed to Target Tau Protein Aggregation.

30 May 2025, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Tau protein aggregation is a hallmark of Alzheimer’s disease, and disrupting this process presents a compelling therapeutic strategy. In this scaffold-focused pilot study, we present an AI-driven cheminformatics workflow to evaluate a curated library of 16 novel triazole–naphthalene derivatives (TNDs) designed for synthetic accessibility and tau-binding pharmacophore features. Molecular descriptors were computed using RDKit and filtered by Pearson correlation (r > 0.95). Three machine learning models—Random Forest, XGBoost, and Support Vector Regression (SVR)—were trained using AutoDock Vina docking scores against phosphorylated tau protein (PDB ID: 6HRF). SHAP analysis revealed rotatable bond count and hydrophobic surface area as key predictors of docking affinity. The SVR model demonstrated consistent performance (CV RMSE = 0.50 kcal/mol), while PCA and t-SNE confirmed chemical space diversity across the TND scaffold. Retrospective docking highlighted three compounds with sub-micromolar predicted affinity, forming key interactions with tau’s microtubule-binding cleft (e.g., Ser285 and Val287). While the dataset is limited in size, this study establishes a reproducible and interpretable machine learning framework for tau-targeted inhibitor design. A follow-up study is underway using an expanded dataset of 40+ known tau aggregation inhibitors to enhance generalizability, support scaffold hopping, and enable de novo molecular design. In addition, future work will incorporate molecular dynamics (MD) simulations and in vitro tau aggregation assays to experimentally validate binding stability and biological activity. All code and data are open-source to ensure full transparency and reproducibility in Alzheimer’s drug discovery.

Keywords

AI/ML
Alzheimer’s disease
Tau protein
Triazole
Molecular docking
QSAR
Virtual screening

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.