WS24: A diverse data set for predicting metal-organic framework stability in water and harsh environments with data-driven models

23 April 2024, Version 1
This content is a preprint and has not undergone peer review at the time of posting.


Metal-organic frameworks (MOFs) are porous materials with applications in gas separations and catalysis, but a lack of water stability often limits their practical use given the ubiquity of water in air and the environment. Consequently, it is useful to predict whether a MOF is water-stable before investing time and resources into synthesis. Existing heuristics for designing water-stable MOFs lack generality and artificially limit the diversity of explored chemistry due to narrowly defined criteria. Machine learning (ML) models offer the promise to improve generality of predictions but require diverse experimental MOF stability data to be trained. In an improvement on previous efforts, we enlarge the available training data for MOF water stability prediction by over 400%, adding 911 MOFs with water stability labels assigned through semi-automated manuscript analysis to curate the new data set WS24. The additional data is shown to improve ML model performance (test ROC-AUC > 0.8) over diverse chemistry for the prediction of both water stability and stability in harsher acidic conditions. We illustrate how the expanded data set and models can be used with previously developed activation stability models to carry out genetic algorithms to quickly screen ~10,000 MOFs from a space of hundreds of thousands for candidates with multivariate stability (i.e., for activation, in water, and in acid). Model analysis and genetic algorithm results uncover metal- and geometry-specific design rules for robust MOFs. The data set and ML models developed in this work, which we disseminate through an easy-to-use web interface, are expected to contribute toward the accelerated discovery of novel, water-stable MOFs for applications such as direct air gas capture and water treatment.


metal organic frameworks
machine learning

Supplementary materials

Supplementary document
Supplementary figures and tables


Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.