Abstract
Accurate knowledge of electronic molecular properties of excited states is fundamental for understanding the behavior of functional materials for organic electronics and sensors. In this work, we focus on determining the properties of the most intense peak in the electronic absorption spectra of organic molecules. For this purpose, we employed the quantum chemistry QM-symex dataset, which has approximately 173,000 organic molecules and time-dependent DFT (TD-DFT) data of the first ten electronic absorption transitions. Each one is identified by its Cartesian coordinates. From data in the original QM-symex, we built a new dataset named QM-symex-modif that contains molecules in Simplified Molecular Input Line Entry System (SMILES) format and properties related to the main electronic transition. We then employed twenty machine learning (ML) algorithms to investigate oscillator strengths, excitation energies, transition orbitals, and the highest occupied molecular orbitals (HOMOs). As inputs for the ML algorithms, we used several chemical descriptors for each molecule generated in the RDKit tool employing the corresponding SMILES format. The generated input descriptors significantly improved the accuracy of the ML predictions for these key photophysical properties. Very good mean absolute errors (MAEs) were obtained for the test set composed of 45,056 molecules, namely, an MAE of 0.035 for oscillator strengths, 0.09 eV for excitation energies, 1.24 and 0.62 for the initial and final transition molecular orbital (MO) numbers (i.e., for each molecule, their position in the MO listing) respectively, and 0.014 for HOMO numbers, with R² values consistently exceeding 0.94, thus demonstrating the accuracy of the models. Additionally, a Shapley additive explanation (SHAP) analysis was carried out to evaluate the importance of the input parameters for the investigated ML models. We found several interesting relationships involving the input parameters. In particular, molecular weight holds significant importance in our ML models for determining the target HOMO numbers and the transition orbitals.
Supplementary materials
Title
Supporting Information.
Description
Supporting information discussed in the text.
Actions
Supplementary weblinks
Title
Group Github
Description
The source code of this work, machine learning model parameters, input files, SHAP values, and output examples are available in the our Github repository
Actions
View