Acceleration of semi-empirical electronic structure theory calculations on consumer-grade GPUs using mixed precision density matrix purification

14 February 2025, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

For semi-empirical electronic structure methods, solving the Roothaan-Hall equa- tions to determine the one-electron density matrix is generally the computational bottleneck. Therefore, alternatives have been proposed to directly solve for the one-electron density matrix without the need to solve for the orbitals first. In this work, we implement an efficient dense linear algebra implementation of Niklasson’s density matrix purification schemes using graphics processing units (GPUs). The computational bottleneck in these methods is the matrix-matrix multiplication needed to construct the purification polynomials, which can be accelerated using GPUs. Of particular interest in this work is the use of consumer-grade GPUs that thrive on algorithms that maximize the amount of single precision (FP32) operations carried out. Therefore, we present a tailored mixed precision (MP) scheme to leverage much of the FP32 performance of these GPUs without sacrificing the numerical accuracy in the self-consistent field (SCF) calculations. We demonstrate that our MP implementation is faster than LAPACK (intel oneMKL DSYGVD) and cuSOLVER DSYGVD diagonalization-based density matrix builds for molecules with more than 1000 basis functions in combination with the semi-empirical GFN2-xTB method. At the same time, the numerical precision of the energies and gradients is not significantly impacted by the MP scheme compared to a full double precision (FP64) treatment. This gives access to significant accelerations of semi-empirical calculations on commodity computing hardware. Going further, we show that our asynchronous GPU implementation enables running multiple SCFs in parallel on a single GPU, which enables leveraging our implementation for accelerating state-of-the-art conformational sampling procedures that are based on molecular dynamics and metadynamic simulations.

Keywords

self-consistent field algorithms
semi-empirical methods
GPU acceleration
density matrix purification
mixed precision handling

Supplementary materials

Title
Description
Actions
Title
Supplementary material
Description
Additional figures with computational timings
Actions

Supplementary weblinks

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.