Revealing Structure-Property Relationships in Polybenzenoid Hydrocarbons with Interpretable Machine-Learning



The structure-property relationships of polybenzenoid hydrocarbons (PBHs) were investigated with interpretable machine learning, for which two new tools were developed and applied. First, a novel textual molecular representation, based on the annulation sequence of PBHs was defined and developed. This representation can be used either in its textual form or as a basis for a curated feature-vector; both forms show improved interpretability over the standard SMILES representation, and the former also has increased predictive accuracy. Second, the recently-developed model, CUSTODI, was applied for the first time as an interpretable model and identified important structural features that impact various electronic molecular properties. The resulting insights not only validate several well-known “rules of thumb” of organic chemistry but also reveal new behaviors and influential structural motifs, thus providing guiding principles for rational design and fine-tuning of PBHs.


Supplementary material

Supporting Information for LALAS paper
Details of computational methods and model construction. Full fit results. Additional comparison to other models.

Supplementary weblinks

Repository for LALAS paper
All data and code used in the described work.