Abstract
Accurately predicting thermophysical properties across different physical states is essential for industrial and scientific applications. However, experimental data measurements often exhibit variability and noise, requiring robust modeling approaches. In this work, we employ machine learning (ML) techniques to predict methane’s thermophysical properties in liquid, vapor, and supercritical phases, including isobaric and isochoric heat capacities, density, volume, Joule-Thomson coefficients, enthalpies, sound speed, and viscosities applying an approach recently developed (ACS Eng. Au, DOI: 10.1021/acsengineeringau.5c00001). We explored different ML algorithms and approaches, including Adaptive Boosting, Bagging, Decision Trees, Extra Trees, Gradient Boosting, Histogram-based Gradient Boosting Regression Tree, K-Nearest Neighbors, Light Gradient Boosting Machine, Nu-Support Vector Regression, Random Forest, Extreme Gradient Boosting, and Artificial Neural Networks. ML models produced predictions that aligned more closely with the statistically treated National Institute of Standards and Technology (NIST) data than with the raw experimental data used to train these models. These results highlight ML’s potential to identify and generalize complex patterns, smooth inherent noise, and manage the variability of different thermophysical properties. They indicate that ML models, particularly Extra Trees and Gradient Boosting, can offer a scalable alternative for thermophysical property predictions, offering consistency and efficiency over traditional methods. Although our approach does not eliminate preprocessing, it demonstrates that ML can effectively manage noisy data independently, offering a more efficient and cost-effective alternative to conventional pre- and post-processing techniques.
Supplementary materials
Title
Supporting materials for methane
Description
The Supporting Information is organized into four main sections. Section S1 (“Number of Data and Machine Learning Models”) provides a summary of the amount of experimental data sourced from the literature used by the National Institute of Standards and Technology (NIST) to produce their equations. It also presents the ML models applied to each thermophysical property. Section S2 (“Experimental Data Visualization”) investigates the diversity of the input data and identifies potential anomalies, which are also considered during the model training process. Section S3 (“Best metrics”) presents the best performance metrics for each thermophysical property, presented to each physical state. The last section, Section S4 (“Others relevant data”) provides guidance on how to access additional data generated in this study, namely, the detailed performance metrics for all ML models applied to each property and the final hyperparameter configurations selected via GridSearchCV for each ML model and thermodynamic phase.
Actions