Abstract
Accurate prediction of melting points for pure molecules remains a significant challenge in predictive chemistry, with implications across various scientific fields, including materials science, drug discovery, and separations chemistry. Traditional methods, such as group contribution (GC) techniques, have shown limited success due to the complex relationship between molecular structure and melting point. In this study, we present a data-driven machine learning (ML) approach to predict the melting points of organic compounds, leveraging both 2D and 3D molecular descriptors. Our dataset comprises 19,811 compounds with 2D features and a subset of 4,568 compounds with additional 3D features. We employed feature selection methods, including pair-wise correlation, Boruta, and principal component analysis, to refine our feature set. Various ML models, including linear regression, ensemble-based regression (Random Forest, gradient- boosted regression, Extreme gradient-boosted regression), support vector regression, and deep learning, were evaluated for their predictive performance. The Extreme Gradient Boosted Regression (XGBR) model demonstrated superior performance with a mean absolute error (MAE) of 27.64 K for 2D features and 31.58 K for combined 2D and 3D features. Outlier detection and removal further improved model accuracy. Additionally, SHAP (SHapley Additive exPlanations) analysis provided insights into feature importance, enhancing model interpretability. Our results indicate that ML models, particularly XGBR, can significantly improve melting point predictions, offering a robust tool for the scientific community.
Scientific Contribution: The P2MAT application capable to predict both melting point and boiling point from SMILEs string as inputs. The GUI is simple and easy to load in the system.
Supplementary materials
Title
P2MAT: A machine learning (ML) driven software for Property Prediction of MATerial
Description
This document contains additional experimental results and feature description table.
Actions