AABBA: Atom–Atom Bond–Bond Bond–Atom Graph Kernel for Machine Learning on Molecules and Materials

05 September 2023, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Graphs are one of the most natural and powerful representations available for molecules; natural because they have an intuitive correspondence to skeletal formulas, the language used by chemists worldwide, and powerful, because they are highly expressive both globally (molecular topology) and locally (atomic properties). Graph kernels are used to transform molecular graphs into fixed-length vectors, which can be used as fingerprints in machine learning (ML) models. To date, kernels have mostly focused on the atomic nodes of the graph. In this work, we developed an extended graph kernel computing atom–atom, bond–bond, and bond–atom (AABBA) autocorrelations. The resulting AABBA representations were evaluated with a transition metal complex benchmark, motivated by the higher complexity of these compounds relative to organic molecules. In particular, we tested different flavors of the AABBA kernel in the prediction of the energy barriers and bond distances of the Vaska’s complex dataset (Friederich et al., Chem. Sci., 2020, 11, 4584). For a variety of ML models, including neural networks, gradient boosting machines, and Gaussian processes, we showed that AABBA outperforms the baseline including only atom–atom autocorrelations. Dimensionality reduction studies also showed that the bond–bond and bond–atom autocorrelations yield many of the most relevant features. We believe that the AABBA graph kernel can accelerate the discovery of chemical compounds and inspire novel molecular representations in which both atomic and bond properties play an important role.

Keywords

graph kernel
metal complexes
autocorrelation
neural networks
gradient boosting
Gaussian processes
feature engineering
feature selection
dimensionality reduction
molecular graphs
property prediction

Supplementary materials

Title
Description
Actions
Title
Supporting Information
Description
The Supporting Information provides further information about the maximal metal-centered depths, computational details of the NN, GBM, and GP models, and additional details about feature relevance and dimensionality reduction.
Actions

Supplementary weblinks

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.