ConfRank+: Extending Conformer Ranking to Charged Molecules

05 June 2025, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

We present a machine learning model for high-throughput energetic ranking of charged molecular conformers. Based on the ConfRank approach, the model is trained in a pairwise fashion to predict energy differences for pairs of conformers. By conditioning the model on dataset embedding vectors, we are able to train our model on two different reference levels simultaneously, allowing for a larger training dataset and to emulate multiple reference methods. In particular, we train our model on a large subset of the SPICE 2.0.1 dataset with references on the ωB97M-D3(BJ)/def2-TZVPPD range-separated hybrid meta-GGA DFT-level and a self-developed conformer dataset based on the GEOM dataset including r²SCAN-3c references. The result is a single multi-fidelity model that can reproduce both reference levels up to ML-typical model errors for small- and medium-sized molecules including the following elements: H, Li, B, C, N, O, F, Na, Mg, Si, P, S, Cl, K, Ca, Br, I. By including partial atomic charges obtained from the electronegativity equilibration charge model, our model incorporates information about the charge distribution in a molecule, allowing the treatment of charged closed-shell species and explicit treatment of electrostatic interactions. We test the ranking capability of the model on various datasets, paying special attention to molecular charges of -1, 0, 1. Throughout all tests, we find our model to be as accurate as current AIMNet2 and MACE-OFF23(L) models, while requiring an order of magnitude fewer parameters and matching the robustness of the state-of-the-art semi-empirical quantum method GFN2-xTB.

Keywords

Conformer ranking
pairwise training
graph neural network
charge model
EEQ model
CREST
CENSO
high-throughput
conformer-search

Supplementary materials

Title
Description
Actions
Title
Supporting Information
Description
The Supporting Information contains additional material such as explanations on the loss function, statistical metrics or tabular overviews for test datasets.
Actions

Supplementary weblinks

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.