The tmQM Dataset - Quantum Geometries and Properties of 86k Transition Metal Complexes


We report the transition metal quantum mechanics dataset (tmQM), which contains the geometries and properties of a large transition metal-organic compound space. tmQM is comprised of 86,665 mononuclear complexes extracted from the Cambridge Structural Database, including Werner, bioinorganic and organometallic complexes based on a large variety of organic ligands and 30 transition metals (the 3d, 4d and 5d from groups 3 to 12). All complexes are closed-shell, and with a formal charge in the range {+1, 0, -1}e. The tmQM dataset provides the Cartesian coordinates of all metal complexes optimized at the DFTB(GFN2-xTB) level, and their molecular size, stoichiometry, and metal node degree. The quantum properties were computed at the DFT(TPSSh-D3BJ/def2-SVP) level, and include the electronic and dispersion energies, HOMO and LUMO orbital energies, HOMO-LUMO gap, dipole moment, and natural charge of the metal center; DFTB(GFN2-xTB) polarizabilities are also provided. Pairwise representations showed the low correlation between these properties, providing nearly continuous maps with unusual regions of the chemical space; e.g. complexes combining large polarizabilities with wide HOMO-LUMO gaps, and complexes combining low-energy HOMO orbitals with electron-rich metal centers. The
tmQM dataset can be exploited in the data-driven discovery of new metal complexes, including predictive models based on machine learning. These models may have a strong impact on the fields in which transition metal chemistry plays a key role; e.g. catalysis, organic synthesis, and materials science. tmQM is an open dataset that can be downloaded free of charge from


Supplementary material