Data-driven imputation of miscibility of aqueous solutions via graph-regularized logistic matrix factorization

Diba Behnoudfar; Cory Simon; Joshua Schrier

doi:10.26434/chemrxiv-2023-t2q5h

Physical Chemistry

Search within Physical Chemistry

Data-driven imputation of miscibility of aqueous solutions via graph-regularized logistic matrix factorization

05 June 2023, Version 1

This is not the most recent version. There is a

newer version

of this content available

Working Paper

Show author details

This content is a preprint and has not undergone peer review at the time of posting.

Abstract

Aqueous, two-phase systems (ATPSs) may form upon mixing two solutions of independently water-soluble compounds. Many separation, purification, and extraction processes rely on ATPSs. Predicting the miscibility of solutions can accelerate and reduce the cost of the discovery of new ATPSs for these applications. Whereas previous machine learning approaches to ATPS prediction used physicochemical properties of each solute as a descriptor, in this work, we show how we can impute missing miscibility outcomes directly from an incomplete collection of pairwise miscibility experiments. We use graph-regularized logistic matrix factorization to learn a latent vector of each solution from (i) the observed entries in the pairwise miscibility matrix and (ii) a graph (nodes: solutes, edges: shared relationships) indicating the general category of the solute (i.e., polymer, surfactant, salt, protein). Using an experimental dataset of the pairwise miscibility of 68 solutions from Peacock et al. [ACS Appl. Mater. Interfaces 2021, 13, 9], we show that graph-regularized logistic matrix factorization more accurately predicts missing (im)miscibility outcomes of pairs of solutions than ordinary logistic matrix factorization and random forest classifiers using physicochemical features of the compounds.

Supplementary materials

Title

Description

Actions

Title

Supporting Information

Description

Complete miscibility matrix, example loss function optimizations, distribution of predictions, visualization and 3D plots of the learned latent vectors, visualization of the latent space with $\gamma = 0$, fraction of immiscible solutions by category, F1, accuracy, precision, and recall performance metrics for the models. (PDF)

Actions

Supplementary weblinks

Title

Description

Actions

Title

Data and Code

Description

Julia code and data needed to reproduce this work.

Actions

View

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Version History

Aug 21, 2023 Version 2

Jun 05, 2023 Version 1

Metrics

1,152

407

Views

Downloads

Citations

License

The content is available under CC BY 4.0

DOI

10.26434/chemrxiv-2023-t2q5h

Funding

National Science Foundation

CBET-1920945

National Science Foundation

PHY-2226511

National Science Foundation

CNS-2018427

Author’s competing interest statement

The author(s) have declared they have no conflict of interest with regard to this content

Ethics

The author(s) have declared ethics committee/IRB approval is not relevant to this content

Data-driven imputation of miscibility of aqueous solutions via graph-regularized logistic matrix factorization

Authors

Abstract

Supplementary materials

Supplementary weblinks

Comments

Version History

Metrics

License

DOI

Funding

Author’s competing interest statement

Ethics

Share