ARC-MOF: A Diverse Database of Metal-Organic Frameworks with DFT-Derived Partial Atomic Charges and Descriptors for Machine Learning

01 August 2022, Version 1
This content is a preprint and has not undergone peer review at the time of posting.


Metal-organic frameworks (MOFs) are a class of crystalline materials composed of metal nodes or clusters connected via semi-rigid organic linkers. Owing to their high surface area, porosity, and tunability, MOFs have received significant attention for numerous applications such as gas separation and storage. Atomistic simulations and data-driven methods (e.g., machine learning) have been successfully employed to screen large databases and successfully develop new experimentally synthesized and validated MOFs for CO2 capture. To enable data-driven materials discovery for any application, the first (and arguably most crucial) step is database curation. This work introduces the ab initio REPEAT charge MOF (ARC-MOF) database. This is a database of ~280,000 MOFs which have been either experimentally characterized or computationally generated, spanning all publicly available MOF databases. A key feature of ARC-MOF is that it contains DFT-derived electrostatic potential fitted partial atomic charges for each MOF. Additionally, ARC-MOF contains pre-computed descriptors for out-of-the-box machine learning applications. An in-depth analysis of the diversity of ARC-MOF with respect to the currently mapped design space of MOFs was performed – a critical, yet commonly overlooked aspect of previously reported MOF databases. Using this analysis, balanced subsets from ARC-MOF for various machine learning purposes have been identified. Other chemical and geometric diversity analyses are presented, with an analysis on the effect of charge assignment method on atomistic simulation of gas uptake in MOFs.


metal-organic framework
MOF database
atomistic simulation
diversity analysis

Supplementary materials

Supporting Information
Further information on the geometric properties, RAC descriptors, diversity analysis, REPEAT charges, and GCMC simulations


Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.