Abstract
We present a machine learning model for high-throughput energetic ranking of charged molecular conformers. Based on the ConfRank approach, the model is trained in a pairwise fashion to predict energy differences for pairs of conformers. By conditioning the model on dataset embedding vectors, we are able to train our model on two different reference levels simultaneously, allowing for a larger training dataset and to emulate multiple reference methods. In particular, we train our model on a large subset of the SPICE 2.0.1 dataset with references on the ωB97M-D3(BJ)/def2-TZVPPD range-separated hybrid meta-GGA DFT-level and a self-developed conformer dataset based on the GEOM dataset including r²SCAN-3c references. The result is a single multi-fidelity model that can reproduce both reference levels up to ML-typical model errors for small- and medium-sized molecules including the following elements: H, Li, B, C, N, O, F, Na, Mg, Si, P, S, Cl, K, Ca, Br, I. By including partial atomic charges obtained from the electronegativity equilibration charge model, our model incorporates information about the charge distribution in a molecule, allowing the treatment of charged closed-shell species and explicit treatment of electrostatic interactions. We test the ranking capability of the model on various datasets, paying special attention to molecular charges of -1, 0, 1. Throughout all tests, we find our model to be as accurate as current AIMNet2 and MACE-OFF23(L) models, while requiring an order of magnitude fewer parameters and matching the robustness of the state-of-the-art semi-empirical quantum method GFN2-xTB.
Supplementary materials
Title
Supporting Information
Description
The Supporting Information contains additional material such as explanations on the loss function, statistical metrics or tabular overviews for test datasets.
Actions