Theoretical and Computational Chemistry

An open-source framework for fast-yet-accurate calculation of quantum mechanical features

Authors

  • Eike Caldeweyher Data Science and Modelling, Pharmaceutical Sciences, R&D, AstraZeneca, Gothenburg, Sweden. ,
  • Christoph Bauer Data Science and Modelling, Pharmaceutical Sciences, ... ,
  • Ali Soltani Tehrani Data Science and Modelling, Pharmaceutical Sciences, ...

Abstract

We present the open-source framework kallisto that enables the efficient and robust calculation of quantum mechanical features for atoms and molecules. For a benchmark set of 49 experimental molecular polarizabilities, the predictive power of the presented method competes against second-order perturbation theory in a converged atomic-orbital basis set at a fraction of its computational costs. Robustness tests within a diverse validation set of more than 80,000 molecules show that the calculation of isotropic molecular polarizabilities has a low failure-rate of only 0.3 %. We present furthermore a generally applicable van der Waals radius model that is rooted on atomic static polarizabilites. Efficiency tests show that such radii can even be calculated for small- to medium-size proteins where the largest system (SARS-CoV-2 spike protein) has 42,539 atoms. Following the work of Domingo-Alemenara et al. [Domingo-Alemenara et al., Nat. Comm., 2019, 10, 5811], we present computational predictions for retention times for different chromatographic methods and describe how physicochemical features improve the predictive power of machine-learning models that otherwise only rely on two-dimensional features like molecular fingerprints. Additionally, we developed an internal benchmark set of experimental super-critical fluid chromatography retention times. For those methods, improvements of up to 17 % are obtained when combining molecular fingerprints with physicochemical descriptors. Shapley additive explanation values show furthermore that the physical nature of the applied features can be retained within the final machine-learning models. We generally recommend the kallisto framework as a robust, low-cost, and physically motivated featurizer for upcoming state-of-the-art machine-learning studies.

Content

Thumbnail image of manuscript.pdf

Supplementary weblinks

kallisto: A command-line interface to simplify computational modelling and the generation of atomic features
Efficiently calculate 3D-atomic/molecular features for quantitative structure-activity relationship approaches.
Benchmark set for static molecular polarizabilities
The data of this repository has been extracted from the supporting information of Ref. (Thakkar, 2015). All structures have been optimized using density functional theory at the CAM-B3LYP-D3(B)/def2-TZVP level of theory.