Neural Mulliken Analysis: Molecular Graphs from Density Matrices for QSPR on Raw Quantum-Chemical Data

06 December 2024, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Since the introduction of Graph Neural Networks (GNNs), molecular graphs have become useful tools in chemical informatics. However, in property prediction tasks, graph embeddings often still resemble traditional fingerprints. Here, we propose a straightforward approach to provide modern GNNs with raw quantum-chemical data, enabling efficient solutions to a range of chemical machine-learning problems. The central role is played by the 1-electron density matrix derived from quantum chemical calculations (e.g. Hartree-Fock, DFT). The diagonal blocks of the density matrix are used as embeddings for the atomic nodes (“atoms”) in the molecular graph. Unlike conventional molecular graph representations, the chemical bond concept is not used. Instead, an additional set of nodes (“links”) between pairs of atoms is introduced. Their embeddings are the off-diagonal blocks of the density matrix, related to particular atom pairs. Directed graph edges connect either “atoms” with “links” or vice versa. The embeddings of the edges are derived from the basis set overlap matrix. The overlaps serve two purposes: first, they encode structural information such as distances and angles. Second, they act as weights in pooling operations. The use of element-wise multiplication of densities and overlaps is inspired by the Mulliken population analysis scheme. The proposed concept was further tested using the Solubility Challenge (2008) by Llinàs et al. (DOI: 10.1021/ci800058v). A GNN was trained on a small dataset comprising 94 aqueous solubilities of drug-like molecules and subsequently used to predict the aqueous solubilities of 28 test molecules. The model achieved an RM SE of 0.68 and an R 2 of 0.76, outperforming all methods proposed at that time. In our view, this represents a promising approach, particularly considering that even in a preliminary test the proposed architecture seems to be able to achieve state-of-the-art accuracy.

Keywords

graph neural networks
solubility
DFT
neural networks

Supplementary weblinks

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.