Abstract
For semi-empirical electronic structure methods, solving the Roothaan-Hall equa- tions to determine the one-electron density matrix is generally the computational bottleneck. Therefore, alternatives have been proposed to directly solve for the one-electron density matrix without the need to solve for the orbitals first. In this work, we implement an efficient dense linear algebra implementation of Niklasson’s density matrix purification schemes using graphics processing units (GPUs). The computational bottleneck in these methods is the matrix-matrix multiplication needed to construct the purification polynomials, which can be accelerated using GPUs. Of particular interest in this work is the use of consumer-grade GPUs that thrive on algorithms that maximize the amount of single precision (FP32) operations carried out. Therefore, we present a tailored mixed precision (MP) scheme to leverage much of the FP32 performance of these GPUs without sacrificing the numerical accuracy in the self-consistent field (SCF) calculations. We demonstrate that our MP implementation is faster than LAPACK (intel oneMKL DSYGVD) and cuSOLVER DSYGVD diagonalization-based density matrix builds for molecules with more than 1000 basis functions in combination with the semi-empirical GFN2-xTB method. At the same time, the numerical precision of the energies and gradients is not significantly impacted by the MP scheme compared to a full double precision (FP64) treatment. This gives access to significant accelerations of semi-empirical calculations on commodity computing hardware. Going further, we show that our asynchronous GPU implementation enables running multiple SCFs in parallel on a single GPU, which enables leveraging our implementation for accelerating state-of-the-art conformational sampling procedures that are based on molecular dynamics and metadynamic simulations.
Supplementary materials
Title
Supplementary material
Description
Additional figures with computational timings
Actions
Supplementary weblinks
Title
Zip file with calculation data
Description
Raw data of the benchmark calculations.
Actions
View