A systematic and general machine learning approach to build a consistent data set from different experiments

10 July 2024, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Experimental data from different sources present challenges due to variability and noise from various experimental conditions, apparatuses, and environmental factors. In this work, we propose a general method to address these challenges to build a consistent data set employing different thermal conductivity experimental data sets of methane from the liquid, vapor, and supercritical phases. Methane is a key hydrocarbon with extensive industrial and environmental applications. The method is based on machine learning (ML) techniques, which are used to consistently integrate data from various experimental sources compiled by the National Institute of Standards and Technology (NIST) database. Different ML algorithms are used for this purpose. Our findings indicate that ML models trained on raw experimental data yield predictions closer to the NIST’s processed data than the original raw experimental data, thus demonstrating the models’ ability to generalize from heterogenous, noisy, and untreated data sets. The proposed ML approach is general and efficient in handling complex and heterogeneous data to deliver reliable predictions without extensive preprocessing.

Keywords

Machine Learning
Experimental data sets
Systematic data sets
NIST
Thermal conductivity
Methane

Supplementary materials

Title
Description
Actions
Title
Evaluation of multiple ML models for all physical phases
Description
Supporting information contains the evaluation of multiple ML models for all physical phases. It includes performance metrics for the ML models in the liquid, vapor, and supercritical phases, as well as scatter plots comparing the experimental data with our ML models and NIST data.
Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.