Uncertainty Quantification in Machine Learning for Glass Transition Temperature Prediction of Polymers

12 July 2024, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Machine learning (ML) has become an important technique in materials science, markedly accelerating the discovery and design of novel materials, and concurrently lowering the burden of experimental costs. Uncertainty quantification (UQ) plays a pivotal role in the accurate prediction and innovative design of novel materials through ML techniques. In this study, we perform a comprehensive evaluation of six UQ methods in ML, including ensemble, Gaussian process regression (GPR), Monte Carlo dropout (MCD), Mean-variance estimation (MVE), Bayesian neural network (BNN) and Evidential deep learning (EDL), for predictions on the glass transition temperature (T_g) of polymers. We assess the accuracy and performance of these UQ methods using three metrics, including Spearman’s rank correlation coefficient, calibration and sparsification, offering a substantial reference for data-driven polymer design. Our analysis encompasses test data, out-of-distribution data from experiments and molecular dynamics simulations, and high-T_g polymer data for UQ analysis of ML predictions. The results indicate that ML models are robust and effective in predicting polymer’s T_g values for testing and experimental data. However, correlating actual errors with uncertainties (standard deviations) poses a significant challenge, with ML models frequently exhibiting overconfidence with low uncertainties. Moreover, the accuracy of ML predictions improves when the data with large uncertainties are excluded, suggesting a potential strategy for refining ML model’s performance.

Keywords

Machine learning
Uncertainty Quantification
Polymer Informatics
Glass Transition Temperature

Supplementary materials

Title
Description
Actions
Title
Supporting Information
Description
Detailed explanations of MFF, results from hyperparameter tuning of ML models, descriptions of loss functions utilized in ML algorithms, SMILES representations and corresponding T_g values for 19 high-T_g homopolymers, as well as Parity plots depicting the performance of the models on these high-T_g polymers.
Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.