Abstract
The prediction of chemical reaction outcomes using machine learning (ML) has emerged as a powerful tool for advancing materials synthesis. However, this approach requires large and diverse datasets which are extremely limited in the field of nanomaterials synthesis, due to inconsistent and non-standardized reporting in the literature, and a lack of understanding of synthetic mechanisms. In this study, we extracted parameters of InP quantum dot (QD) syntheses as our inputs, and resultant properties (absorption, emission, diameter) as our outputs from 72 publications. We “filled in” missing outputs using a data imputation method to prepare a complete dataset containing 216 entries for training and testing predictive ML models. We defined the descriptor space in two ways (condensed and extended) based on the chemical identity or role of reagents to explore the best approach for categorizing input features. We achieved mean absolute errors (MAEs) as low as 20.29, 11.46, and 0.33 nm for absorption, emission, and diameter respectively with our best ML model. We used these models to deploy an accessible and interactive webapp for designing syntheses of InP (https://share.streamlit.io/cossairt-lab/indium-phosphide/Hot_injection/hot_injection_prediction.py). Using this webapp, we investigated the power of ML to uncover chemical trends in InP syntheses, such as the effects of common additives. We also designed and conducted new experiments based on extensions of literature procedures and compared our experimentally measured properties to predictions, thus evaluating the “real-life” accuracy of our models. Conversely, we designed an experiment to obtain InP QDs with specific properties. Finally, we applied the same approach to train, test, and launch predictive models for CdSe QDs by expanding a previously published dataset. Altogether, our data pre-processing method and ML implementations in this study show the ability to design materials with targeted properties and explore underlying reaction mechanisms despite limited data resources.
Supplementary materials
Title
Supplementary information
Description
Additional details for data acquisition, data imputation, Pearson correlation, datasets, code files, machine learning modeling, and experimental methods.
Actions