Can Machine Learning Be More Accurate Than TD-DFT? Prediction of Emission Wavelengths and Quantum Yields of Organic Fluorescent Materials

The prediction of photophysical parameters is of crucial practical importance for the development of functional organic fluorescent materials, whereas the expense of quantum mechanical calculations and the relatively low universality of QSAR models have challenged the task. New avenues opened up by machine learning (ML), we establish a database of solvated organic fluorescent dyes and develop highly efficient ML models for the predictions of maximum emission/absorption wavelength and photoluminescence quantum yield (PLQY), providing a reliable and efficient potential approach to high-throughput screenings. Various combinations of ML algorithms and molecular fingerprints were investigated. For emission wavelengths, TD-DFT accuracy was achieved under realworld conditions. Reliable identification of strong fluorescent materials was also demonstrated. We show that the easily obtainable fingerprint inputs combined with proper ML algorithms enables efficient re-training based on additional datapoints, whereby systematic improvements of the ML models can be achieved utilizing experimental feedbacks.