QSPR Models for Predicting Critical Micelle Concentration of Gemini Cationic Surfactants Combining Machine-Learning Methods and Molecular Descriptors
Preprints are manuscripts made publicly available before they have been submitted for formal peer review and publication. They might contain new research findings or data. Preprints can be a draft or final version of an author's research but must not have been accepted for publication at the time of submission.
A data set of 231 diverse gemini cationic surfactants has been developed to correlate the logarithm of critical micelle concentration (cmc) with the molecular structure using a quantitative structure-property relationship (QSPR) methods. The QSPR models were developed using the Online CHEmical Modeling environment (OCHEM). It provides several machine learning methods and molecular descriptors sets as a tool to build QSPR models. Molecular descriptors were calculated by eight different software packages including Dragon v6, OEstate and ALogPS, CDK, ISIDA Fragment, Chemaxon, Inductive Descriptor, SIRMS, and PyDescriptor. A total of 64 QSPR models were generated, and one consensus model developed by using a simple average of 13 top-ranked individual models. Based on the statistical coefficient of QSPR models, a consensus model was the best QSPR models. The model provided the highest R2 = 0.95, q2 = 0.95, RMSE = 0.16 and MAE = 0.11 for training set, and R2 = 0.87, q2 = 0.87, RMSE = 0.35 and MAE = 0.21 for test set. The model was freely available at https://ochem.eu/model/8425670 and can be used for estimation of cmc of new gemini cationic surfactants compound at the early steps of gemini cationic surfactants development.