Chemical databases are an essential tool for data-driven investigation of structure-property relationships and design of novel functional compounds. We introduce the first phase of the COMPAS Project – a COMputational database of Polycyclic Aromatic Systems. In this phase, we have developed two datasets containing the optimized ground-state structures and a selec- tion of molecular properties of 34k and 9k cata- condensed polybenzenoid hydrocarbons (at the GFN2-xTB and B3LYP-D3BJ/def2-SVP lev- els, respectively), and have placed them in the public domain. Herein we describe the process of the dataset generation, detail the informa- tion available within the datasets, and show the fundamental features of the generated data. We analyze the correlation between the two types of computation as well as the structure- property relationships of the calculated species. The data and the insights gained from them can inform rational design of novel functional aro- matic molecules for use in, e.g., organic elec- tronics, and can provide a basis for additional data-driven machine- and deep-learning studies in chemistry.
Supporting Information for COMPAS_Phase1
General computational details, description of benchmarking procedure, histograms of data distribution, color-coded plots for all studied structural features, further analysis on D3 versus D4 corrections.