Abstract
Tau protein aggregation is a hallmark of Alzheimer’s disease, and disrupting this process
presents a compelling therapeutic strategy. In this scaffold-focused pilot study, we present
an AI-driven cheminformatics workflow to evaluate a curated library of 16 novel
triazole–naphthalene derivatives (TNDs) designed for synthetic accessibility and
tau-binding pharmacophore features. Molecular descriptors were computed using RDKit
and filtered by Pearson correlation (r > 0.95). Three machine learning models—Random
Forest, XGBoost, and Support Vector Regression (SVR)—were trained using AutoDock
Vina docking scores against phosphorylated tau protein (PDB ID: 6HRF). SHAP analysis
revealed rotatable bond count and hydrophobic surface area as key predictors of docking
affinity. The SVR model demonstrated consistent performance (CV RMSE = 0.50
kcal/mol), while PCA and t-SNE confirmed chemical space diversity across the TND
scaffold.
Retrospective docking highlighted three compounds with sub-micromolar predicted
affinity, forming key interactions with tau’s microtubule-binding cleft (e.g., Ser285 and
Val287). While the dataset is limited in size, this study establishes a reproducible and
interpretable machine learning framework for tau-targeted inhibitor design. A follow-up
study is underway using an expanded dataset of 40+ known tau aggregation inhibitors to
enhance generalizability, support scaffold hopping, and enable de novo molecular design.
In addition, future work will incorporate molecular dynamics (MD) simulations and in
vitro tau aggregation assays to experimentally validate binding stability and biological
activity. All code and data are open-source to ensure full transparency and reproducibility
in Alzheimer’s drug discovery.