QSPR Models for Predicting Critical Micelle Concentration of Gemini Cationic Surfactants Combining Machine-Learning Methods and Molecular Descriptors

A data set of 231 diverse gemini cationic surfactants has been developed to correlate the logarithm of critical micelle concentration (cmc) with the molecular structure using a quantitative structure-property relationship (QSPR) methods. The QSPR models were developed using the Online CHEmical Modeling environment (OCHEM). It provides several machine learning methods and molecular descriptors sets as a tool to build QSPR models. Molecular descriptors were calculated by eight different software packages including Dragon v6, OEstate and ALogPS, CDK, ISIDA Fragment, Chemaxon, Inductive Descriptor, SIRMS, and PyDescriptor. A total of 64 QSPR models were generated, and one consensus model developed by using a simple average of 13 top-ranked individual models. Based on the statistical coefficient of QSPR models, a consensus model was the best QSPR models. The model provided the highest R2 = 0.95, q2 = 0.95, RMSE = 0.16 and MAE = 0.11 for training set, and R2 = 0.87, q2 = 0.87, RMSE = 0.35 and MAE = 0.21 for test set. The model was freely available at https://ochem.eu/model/8425670 and can be used for estimation of cmc of new gemini cationic surfactants compound at the early steps of gemini cationic surfactants development.