Abstract
Here molecular graphs derived from the 1-electron density matrix are introduced within a more general effort to explore whether electronic structure awareness allows combining generalization from small data and optimal feature learning in a single model. Diagonal matrix blocks serve as atomic nodes embeddings, off-diagonal blocks provide embeddings for “link” nodes in-between atomic pairs. In a minimal basis, these embeddings have dimensions of only 45 and 81, still no data loss occurs. The overlap matrix is used in edge embeddings to encode structural information and as weights for pooling operations. Additionally, element-wise multiplication performed while pooling may provide access to electronic charges similar to Mulliken population analysis. A GNN trained on 94 drug-like molecules from the Solubility Challenge (2008, Llinàs et al.}) demonstrated improved solubility prediction accuracy (RMSE 0.63, R2 0.79). If combined with existing techniques for predicting electron density from molecular structure, this approach is promising for addressing various chemical machine-learning problems.