Abstract
Chemical databases are an essential tool for
data-driven investigation of structure-property
relationships and design of novel functional
compounds. We introduce the first phase of the
COMPAS Project – a COMputational database
of Polycyclic Aromatic Systems. In this phase,
we have developed two datasets containing the
optimized ground-state structures and a selec-
tion of molecular properties of 34k and 9k cata-
condensed polybenzenoid hydrocarbons (at the
GFN2-xTB and B3LYP-D3BJ/def2-SVP lev-
els, respectively), and have placed them in the
public domain. Herein we describe the process
of the dataset generation, detail the informa-
tion available within the datasets, and show
the fundamental features of the generated data.
We analyze the correlation between the two
types of computation as well as the structure-
property relationships of the calculated species.
The data and the insights gained from them can
inform rational design of novel functional aro-
matic molecules for use in, e.g., organic elec-
tronics, and can provide a basis for additional
data-driven machine- and deep-learning studies
in chemistry.
Content

Supplementary material

Supporting Information for COMPAS_Phase1
General computational details, description of benchmarking procedure, histograms of data distribution, color-coded plots for all studied structural features, further analysis on D3 versus D4 corrections.
Supplementary weblinks
Repository of COMPAS
Freely accessible repository of the COMPAS database.