Theoretical and Computational Chemistry

InFrag: Using Attribution-based Explainability to Guide Deep Molecular Optimization



The recently proposed Genetic expert guided learning (GEGL) framework has demonstrated impressive performances on several \textit{de novo} molecular design tasks. Despite the displayed state-of-the art results, the proposed system relies on an expert-designed Genetic expert. Although hand-crafted experts allow to navigate the chemical space efficiently, designing such experts requires a significant amount of effort and might contain inherent biases which can potentially slow down convergence or even lead to sub-optimal solutions. In this research, we propose a novel genetic expert named \textit{InFrag} which is free of design rules and can generate new molecules by combining promising molecular fragments. Fragments are obtained by using an additional graph convolutional neural network which computes attributions for each atom for a given molecule. Molecular substructures which contribute positively to the task score are kept and combined to propose novel molecules. We experimentally demonstrate that, within the GEGL framework, our proposed attribution-based genetic expert is either competitive or outperforms the original expert-designed genetic expert on goal-directed optimization tasks. When limiting the number of optimization rounds to one and three rounds, a performance increase of approximately 43% and 20% respectively is observed compared to the baseline genetic expert. Furthermore, we empirically show that combining several experts that share a fixed sampling budget at each optimization round generally improves or maintains the overall performance of the framework.


Thumbnail image of infrag_chemrvix.pdf

Supplementary material

Thumbnail image of infrag_supp_chemrvix.pdf
Supporting information: InFrag: Using Attribution-based Explainability to Guide Deep Molecular Optimization
Supporting information for the main manuscript

Supplementary weblinks

Link to source code
This link redirects to the source code used for the described experiments and used data.