A substructure-aware loss for feature attribution in drug discovery



Explainable machine learning is increasingly used in drug discovery to help rationalize compound property predictions. Feature attribution techniques are popular choices within this toolbox that aim to identify which substructures of a molecule are responsible for a predicted property change. However, established molecular feature attribution methods have so far displayed low performance for popular deep learning algorithms such as graph neural networks (GNNs), particularly when compared with simpler modeling alternatives such as random forests coupled with atom masking. To mitigate this problem, in this work we present a modification to the regression objective of GNNs to specifically account for common core structures between pairs of molecules. The proposed approach shows higher accuracy on a recently-proposed explainability benchmark. We expect this methodology to be useful in drug discovery pipelines, and specifically in lead optimization efforts where specific chemical series of related compounds are investigated.

Version notes

Updated public GitHub link for supporting code.


Supplementary weblinks

Supplementary materials and code for the main manuscript