Deciphering molecular embeddings with centered kernel alignment

14 May 2024, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

The creation of effective models is of utmost importance in various scientific and engineering domains. However, analyzing such models, especially nonlinear ones, poses significant challenges. In this context, centered kernel alignment (CKA) has emerged as a promising model analysis tool that assesses the independence between two embeddings. CKA's efficacy depends on the selection of a kernel that adequately captures the underlying properties of the compared models. We examine the properties of the linear and random forest (RF) kernel with respect to multilayer perceptrons (MLPs) and RFs to adapt the model analysis tool CKA to cheminformatics. Furthermore, we demonstrate the utility of CKA in cheminformatics in three case studies in which we (1) investigate why optimizing the radius of circular fingerprints beyond two bonds results in only minor changes in the performance of models, (2) analyze the dependence between physicochemical properties and the molecular representations induced by graph neural networks (GNNs) that use addition as readout operation, and (3) compare different graph readout operations in GNNs.

Keywords

centered kernel alignment
random forest kernel
machine learning
graph neural networks
molecular representations
representational alignment

Supplementary materials

Title
Description
Actions
Title
Supporting Information
Description
The Supporting Information contains further details on model performance (PDF).
Actions

Supplementary weblinks

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.