Complex structure-free compound-protein interaction prediction for mitigating activity cliff-induced discrepancies and integrated bioactivity learning

24 March 2025, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Protein-ligand binding affinity assessment plays a pivotal role in virtual drug screening, yet conventional data-driven approaches rely heavily on limited protein-ligand crystal structures. Structure-free compound-protein interaction (CPI) methods have emerged as competitive alternatives, leveraging extensive bioactivity data to serve as more robust scoring functions. However, these methods often overlook two critical challenges that affect data efficiency and modeling accuracy: the heterogeneity of bioactivity data due to differences in bioassay measurements, and the presence of activity cliffs (ACs)—small chemical modifications that lead to significant changes in bioactivity, which have not been thoroughly investigated in CPI modeling. To address these challenges, we present CPI2M, a large-scale CPI benchmark dataset containing approximately 2 million bioactivity endpoints across four activity types (Ki, Kd, EC50, and IC50) with AC annotations. Moreover, we developed GGAP-CPI, a structure-free deep learning model trained by integrated bioactivity learning and designed to mitigate the impact of ACs on CPI prediction through advanced protein representation modelling and integrated bioactivity learning. Our comprehensive evaluation demonstrates that GGAP-CPI outperforms 12 target-specific and 7 general CPI baselines across four key scenarios (general CPI prediction, rare protein prediction, transfer learning, and virtual screening) on seven benchmarks (CPI2M, MoleculeACE, CASF-2016, MerckFEP, DUD-E, DEKOIS-v2, and LIT-PCBA). Furthermore, GGAP-CPI not only delivers stable predictions by distinguishing bioactivity differences between ACs and non-ACs, but also enriches binding pocket residues and interactions, underscoring its applicability to real-world binding affinity assessments and virtual drug screening.

Keywords

Protein-Ligand Binding Affinity Prediction
Compound-Protein Interaction Prediction
Activity Cliff
Virtual Screening
Graph Neural Network

Supplementary materials

Title
Description
Actions
Title
Supporting Information
Description
Supporting Information
Actions

Supplementary weblinks

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.