Abstract
In structure-based virtual screening (SBVS), it is critical for machine-learning scoring functions (MLSFs) to capture protein-ligand atomic interaction patterns. We generated a cross-target generalization ability benchmark for protein-ligand binding affinity prediction to assess whether MLSFs could capture these interactions. By focusing on the local domains in protein-ligand binding pockets, we developed standardized pocket Pfam-based clustering (Pfam-cluster) approach for the generalization ability benchmark. Subsequently, 11 typical MLSFs were tested using random cross-validation (Random-CV), protein sequence similarity-based cross-validation (Seq-CV), and pocket Pfam-based cross-validation (Pfam-CV) methods. Surprisingly, all of the tested models showed decreased performance as they were evaluated from Random-CV to Seq-CV to Pfam-CV experiments, without showing satisfactory generalization capacity. Interpretable analysis revealed that predictions on novel targets by MLSFs were relying on buried solvent accessible surface area (SASA)-related features in complex structures. By combining buried SASA-related information with ligand-specific patterns that were only shared among structurally similar compounds, higher performance in Random-CV tests was attained for Random forest (RF)-Score. Based on these findings, we strongly advise assessing the generalization ability of MLSFs with the Pfam-cluster approach and being cautious with the features learned by MLSFs.
Supplementary materials
Title
supplementary figures
Description
supplementary figures
Actions
Title
supplementary tables
Description
supplementary tables
Actions
Supplementary weblinks
Title
scripts of MLSF generalization ability benchmark
Description
The complete Pfam-cluster approach, 3-fold dataset split, and SHAP analysis processes are available on https://github.com/hnlab/generalization_benckmark. All other data are also available upon request.
Actions
View