Abstract
Experimental data from different sources present challenges due to variability and noise from various experimental conditions, apparatuses, and environmental factors. In this work, we propose a general method to address these challenges to build a consistent data set. As a case study, we analyze experimental data sets of methane’s thermal conductivity across the liquid, vapor, and supercritical phases. The method is based on machine learning (ML) techniques, which consistently integrate data from various experimental sources compiled by the National Institute of Standards and Technology (NIST) database. Different ML algorithms are used for this purpose. Our findings indicate that ML models yield predictions closer to the NIST’s processed data than to the original raw experimental data used to train the models. This demonstrates the models’ generalization ability from heterogeneous, noisy, and untreated data sets. While our approach does not eliminate preprocessing, it suggests that ML can autonomously handle noisy data, providing a faster and cost-effective alternative to traditional pre- and postprocessing methods. By guiding the refinement of labor-intensive methods, ML proves adaptable for real-time data, enabling immediate adjustments and revolutionizing industrial and scientific optimizations. Therefore, the proposed ML approach is general and efficient in handling complex and heterogeneous data to deliver reliable predictions without extensive preprocessing.
Supplementary materials
Title
Evaluation of multiple ML models for all physical phases
Description
Supporting information contains the evaluation of multiple ML models for all physical phases. It includes performance metrics for the ML models in the liquid, vapor, and supercritical phases, as well as scatter plots comparing the experimental data with our ML models and NIST data.
Actions