A systematic and general machine learning approach to build a consistent data set from different experiments: application to the thermal conductivity of methane

Matheus Máximo-Canadas; Julio Cesar Duarte; Jakler Nichele; Leonardo Alves; Luiz Octavio Pereira; Rogerio Ramos; Itamar Borges

doi:10.26434/chemrxiv-2024-g6bsl-v2

Chemical Engineering and Industrial Chemistry

Search within Chemical Engineering and Industrial Chemistry

A systematic and general machine learning approach to build a consistent data set from different experiments: application to the thermal conductivity of methane

18 December 2024, Version 2

This is not the most recent version. There is a

newer version

of this content available

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Experimental data from different sources present challenges due to variability and noise from various experimental conditions, apparatuses, and environmental factors. In this work, we propose a general method to address these challenges to build a consistent data set. As a case study, we analyze experimental data sets of methane’s thermal conductivity across the liquid, vapor, and supercritical phases. The method is based on machine learning (ML) techniques, which consistently integrate data from various experimental sources compiled by the National Institute of Standards and Technology (NIST) database. Different ML algorithms are used for this purpose. Our findings indicate that ML models yield predictions closer to the NIST’s processed data than to the original raw experimental data used to train the models. This demonstrates the models’ generalization ability from heterogeneous, noisy, and untreated data sets. While our approach does not eliminate preprocessing, it suggests that ML can autonomously handle noisy data, providing a faster and cost-effective alternative to traditional pre- and postprocessing methods. By guiding the refinement of labor-intensive methods, ML proves adaptable for real-time data, enabling immediate adjustments and revolutionizing industrial and scientific optimizations. Therefore, the proposed ML approach is general and efficient in handling complex and heterogeneous data to deliver reliable predictions without extensive preprocessing.

Keywords

Machine Learning

Experimental data sets

Supplementary materials

Title

Description

Actions

Title

Evaluation of multiple ML models for all physical phases

Description

Supporting information contains the evaluation of multiple ML models for all physical phases. It includes performance metrics for the ML models in the liquid, vapor, and supercritical phases, as well as scatter plots comparing the experimental data with our ML models and NIST data.

Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Version History

Feb 24, 2025 Version 3

Dec 18, 2024 Version 2

Jul 10, 2024 Version 1

Version Notes

More information on error metrics were included as well as new plots of the raw experimental data.

Metrics

810

299

Views

Downloads

Citations

License

The content is available under CC BY NC ND 4.0

DOI

10.26434/chemrxiv-2024-g6bsl-v2

Funding

Fundação Carlos Chagas Filho de Amparo à Pesquisa do Estado do Rio de Janeiro

E-26/201.197/2021, E-26/211.046/2021 E-26/201.251/2022, and E-26/201.190/202

Conselho Nacional de Desenvolvimento Científico e Tecnológico

304148/2018–0 and 409447/2018–8

Petrobras

code 2021/00093-

Author’s competing interest statement

The author(s) have declared they have no conflict of interest with regard to this content

Ethics

The author(s) have declared ethics committee/IRB approval is not relevant to this content

A systematic and general machine learning approach to build a consistent data set from different experiments: application to the thermal conductivity of methane

Authors

Abstract

Keywords

Supplementary materials

Comments

Version History

Version Notes

Metrics

License

DOI

Funding

Author’s competing interest statement

Ethics

Share