The structure-property relationships of polybenzenoid hydrocarbons (PBHs) were investigated with interpretable machine learning, for which two new tools were developed and applied. First, a novel textual molecular representation, based on the annulation sequence of PBHs was defined and developed. This representation can be used either in its textual form or as a basis for a curated feature-vector; both forms show improved interpretability over the standard SMILES representation, and the former also has increased predictive accuracy. Second, the recently-developed model, CUSTODI, was applied for the first time as an interpretable model and identified important structural features that impact various electronic molecular properties. The resulting insights not only validate several well-known “rules of thumb” of organic chemistry but also reveal new behaviors and influential structural motifs, thus providing guiding principles for rational design and fine-tuning of PBHs.
Supporting Information for LALAS paper