Abstract
Since the introduction of Graph Neural Networks (GNNs), molecular graphs have become useful tools in chemical informatics. However, in property prediction tasks, graph embeddings often still resemble traditional fingerprints. Here, we propose a straightforward approach to provide modern GNNs with raw quantum-chemical data, enabling efficient solutions to a range of chemical machine-learning problems. The central role is played by the 1-electron density matrix derived from quantum chemical calculations (e.g. Hartree-Fock, DFT). The diagonal blocks of the density matrix are used as embeddings for the atomic nodes (“atoms”) in the molecular graph. Unlike conventional molecular graph representations, the chemical bond concept is not used. Instead, an additional set of nodes (“links”) between pairs of atoms is introduced. Their embeddings are the off-diagonal blocks of the density matrix, related to particular atom pairs. Directed graph edges connect either “atoms” with “links” or vice versa. The embeddings of the edges are derived from the basis set overlap matrix. The overlaps serve two purposes: first, they encode structural information such as distances and angles. Second, they act as weights in pooling operations. The use of element-wise multiplication of densities and overlaps is inspired by the Mulliken population analysis scheme. The proposed concept was further tested using the Solubility Challenge (2008) by Llinàs et al. (DOI: 10.1021/ci800058v). A GNN was trained on a small dataset comprising 94 aqueous solubilities of drug-like molecules and subsequently used to predict the aqueous solubilities of 28 test molecules. The model achieved an RM SE of 0.68 and an R 2 of 0.76, outperforming all methods proposed at that time. In our view, this represents a promising approach, particularly considering that even in a preliminary test the proposed architecture seems to be able to achieve state-of-the-art accuracy.