Neural Mulliken Analysis: Molecular Graphs from Density Matrices for QSPR on Raw Quantum-Chemical Data

18 February 2025, Version 2
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Here molecular graphs derived from the 1-electron density matrix are introduced within a more general effort to explore whether electronic structure awareness allows combining generalization from small data and optimal feature learning in a single model. Diagonal matrix blocks serve as atomic nodes embeddings, off-diagonal blocks provide embeddings for “link” nodes in-between atomic pairs. In a minimal basis, these embeddings have dimensions of only 45 and 81, still no data loss occurs. The overlap matrix is used in edge embeddings to encode structural information and as weights for pooling operations. Additionally, element-wise multiplication performed while pooling may provide access to electronic charges similar to Mulliken population analysis. A GNN trained on 94 drug-like molecules from the Solubility Challenge (2008, Llinàs et al.}) demonstrated improved solubility prediction accuracy (RMSE 0.63, R2 0.79). If combined with existing techniques for predicting electron density from molecular structure, this approach is promising for addressing various chemical machine-learning problems.

Keywords

graph neural networks
solubility
DFT
neural networks

Supplementary weblinks

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.