Abstract
Modeling molecular activity and quantitative structure-activity relationships of chemical compounds is critical in drug design. Graph neural networks, which utilize molecular structures as frames, have shown success in assessing the biological activity of chemical compounds, guiding the selection and optimization of candidates for further development. However, current models often overlook activity cliffs (ACs)—cases where structurally similar molecules exhibit different bioactivities—due to latent spaces primarily optimized for structural features.
Here, we introduce AC-awareness (ACA), an inductive bias designed to enhance molecular representation learning for activity modeling. The ACA jointly optimizes metric learning in the latent space and task performance in the target space, making models more sensitive to ACs. We develop \name, an AC-informed contrastive learning approach that can be integrated with any graph neural network. Experiments on 39 benchmark datasets demonstrate that AC-informed representations of chemical compounds consistently outperform standard models in bioactivity prediction across both regression and classification tasks. AC-informed models show strong performance in predicting pharmacokinetic and safety-relevant molecular properties.
ACA paves the way toward activity-informed molecular representations, providing a valuable tool for the early stages of lead compound identification, refinement, and virtual screening.
Supplementary materials
Title
Supplementary.pdf
Description
Supplementary Tables and Figures:
Supplementary Figures S1 to S8
Supplementary Tables S1 to S9
Actions