Abstract
Aqueous, two-phase systems (ATPSs) may form upon mixing two solutions of independently water-soluble compounds. Many separation, purification, and extraction processes rely on ATPSs. Predicting the miscibility of solutions can accelerate and reduce the cost of the discovery of new ATPSs for these applications. Whereas previous machine learning approaches to ATPS prediction used physicochemical properties of each solute as a descriptor, in this work, we show how we can impute missing miscibility outcomes directly from an incomplete collection of pairwise miscibility experiments. We use graph-regularized logistic matrix factorization to learn a latent vector of each solution from (i) the observed entries in the pairwise miscibility matrix and (ii) a graph (nodes: solutes, edges: shared relationships) indicating the general category of the solute (i.e., polymer, surfactant, salt, protein). Using an experimental dataset of the pairwise miscibility of 68 solutions from Peacock et al. [ACS Appl. Mater. Interfaces 2021, 13, 9], we show that graph-regularized logistic matrix factorization more accurately predicts missing (im)miscibility outcomes of pairs of solutions than ordinary logistic matrix factorization and random forest classifiers using physicochemical features of the compounds.
Supplementary materials
Title
Supporting Information
Description
Complete miscibility matrix, example loss function optimizations, distribution of predictions, visualization and 3D plots of the learned latent vectors, visualization of the latent space with $\gamma = 0$, fraction of immiscible solutions by category, F1, accuracy, precision, and recall performance metrics for the models. (PDF)
Actions