Surfactants are amphiphilic molecules that are widely used in consumer products, industrial processes, and biological applications. A critical property of a surfactant is the critical micelle concentration (CMC), which is the concentration at which surfactant molecules undergo cooperative self-assembly in solution. Notably, the primary method to obtain CMCs experimentally—tensiometry—is laborious and expensive. In this work, we show that graph convolutional neural networks (GCNs) can predict CMCs directly from the surfactant molecular structure. Specifically, we developed a GCN architecture that encodes the surfactant structure in the form of a molecular graph and trained it using experimental CMC data. We found that the GCN can predict CMCs with higher accuracy than previously proposed methods and that it can generalize to anionic, cationic, zwitterionic, and nonionic surfactants. Molecular saliency maps revealed how atom types and surfactant molecular substructures contribute to CMCs and found this to be in agreement with physical rules that correlate constitutional and topological information to CMCs. Following such rules, we proposed a small set of new surfactants for which experimental CMCs are not available; for these molecules, CMCs predicted with our GCN exhibited similar trends to those obtained from molecular simulations. These results provide evidence that GCNs can enable high-throughput screening of surfactants with desired self-assembly characteristics.
CMCpaper SI submitted