Physical Chemistry

WS22 database: combining Wigner Sampling and geometry interpolation towards configurationally diverse molecular datasets

Authors

  • Max Pinheiro Jr Aix-Marseille University ,
  • Shuang Zhang State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University ,
  • Pavlo O. Dral State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, and College of Chemistry and Chemical Engineering, Xiamen University ,
  • Mario Barbatti Aix-Marseille University

Abstract

Multidimensional surfaces of quantum chemical properties such as potential energies and dipole moments are common targets for machine learning, requiring the development of robust and diverse databases extensively exploring molecular configurational spaces. Here we composed the WS22 database covering several quantum mechanical (QM) properties (including potential energies, forces, dipole moments, polarizabilities, HOMO, and LUMO energies) for ten flexible organic molecules of increasing complexity and with up to 22 atoms. This database consists of 1.18~million equilibrium and non-equilibrium geometries carefully sampled from Wigner distributions centered at different equilibrium conformations (either at the ground or excited electronic states) and further augmented with interpolated structures. The diversity of our data sets is demonstrated by visualizing the geometries distribution with dimensionality reduction as well as via comparison of statistical features of the QM properties with those available in existing data sets. Our sampling targets broader quantum mechanical distribution of the configurational space than provided by commonly used sampling through classical molecular dynamics, upping the challenge for machine learning models.

Content

Thumbnail image of WS22_dataset_manuscript.pdf

Supplementary weblinks

WS22 dashboard
This website provides an interactive dashboard to explore the main statistical features of the datasets available in the WS22 database. Using the dashboard the user can easily visualize histograms with the statistical distribution of the geometrical and quantum mechanical properties computed at the DFT level. Moreover, the dashboard also provides direct visualization of the 3D molecular structures with an option to download specific geometries. The information visualized in the dashboard is directly read from the Zenodo repository (https://zenodo.org/record/7032334#.YyQmE9JByXJ) which contains 10 independent data files (Numpy NPZ format) with 120,000 points corresponding to the geometrical configurations of the molecules sampled from a Wigner distribution.