Abstract
Metal-organic frameworks are promising porous materials for several applications like gas adsorption, separation, transportation, and photocatalyst, but their large-scale computational screening requires high-quality, computation-ready structural data. Existing databases often contain errors due to experimental limitations, including inaccurately determined hydrogen positions, atomic overlaps, and missing components. We introduce MOFChecker to address these issues, providing tools for duplicate detection, geometric and charge error checking, and structure correction. Errors are systematically corrected through atomic adjustments on structures in the database, including deleting duplicated structures and adding missing hydrogen atoms, counterions, and linkers. Evaluation of established MOF databases, like the CoRE2014 database, indicates that 38% of structures contain significant errors, highlighting the importance of MOFChecker in ensuring accurate structural data for subsequent density functional theory (DFT) optimizations and computational studies. This work aims to enhance the reliability of MOF databases for high-throughput screening and practical applications.