The COMPAS Project: A Computational Database of Polycyclic Aromatic Systems. Phase 2: cata-condensed Hetero-Polycyclic Aromatic Systems

26 July 2023, Version 1
This content is a preprint and has not undergone peer review at the time of posting.


Polycyclic aromatic systems are highly important to numerous applications, especially to organic electronics and optoelectronics. High-throughput screening and generative models can help to identify new molecules that can advance these technologies but require large amounts of high-quality data, which is expensive to generate. In this report, we present the largest freely available data set of geometries and properties of cata-condensed poly(hetero)cyclic aromatic molecules calculated to date. Our data set contains ~500k molecules comprising 11 types of aromatic and antiaromatic building blocks calculated at the GFN1-xTB level and is representative of a highly diverse chemical space. The methodologies used to enumerate and compute the various structures and their electronic properties (including HOMO-LUMO gap, vertical and adiabatic ionization potential, and electron affinity) are detailed. Additionally, we benchmark the values against a ~50k data set calculated at the CAM-B3LYP-D3BJ/def2-SVP level and develop a fitting scheme to correct the xTB values to higher accuracy. These new data sets represent the second installment in the COMputational database of Polycyclic Aromatic Systems (COMPAS) Project.


polycyclic aromatic systems
organic electronics
aromatic molecules
high-throughput computation
polycyclic aromatic hydrocarbons

Supplementary materials

Supporting Information for COMPAS-2
General computational details, details of the xTB-correction, description of benchmarking procedure, histograms of data distribution, color-coded plots for all studied structural fea- tures, further analysis of the effect of sulfur on Etot.

Supplementary weblinks


Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.