Explaining compound activity predictions with a substructure-aware loss for graph neural networks


Explainable machine learning is increasingly used in drug discovery to help rationalize compound property predictions. Feature attribution techniques are popular choices to identify which molecular substructures are responsible for a predicted property change. However, established molecular feature attribution methods have so far displayed low performance for popular deep learning algorithms such as graph neural networks (GNNs), especially when compared with simpler modeling alternatives such as random forests coupled with atom masking. To mitigate this problem, in this work a modification of the regression objective for GNNs is proposed to specifically account for common core structures between pairs of molecules. The presented approach showed higher accuracy on a recently-proposed explainability benchmark. This methodology has the potential to assist with model explainability in drug discovery pipelines, particularly in lead optimization efforts where specific chemical series are investigated.

Version notes

Production-ready version before submission.


Supplementary material

Supporting information
Supporting plots and tables to the main manuscript

Supplementary weblinks

Supplementary code to the main manuscript