A Data-Science Approach to Predict the Heat Capacity of Nanoporous Materials

The heat capacity of a material is a fundamental property that is of great practical importance. For example, in a carbon capture process, the heat required to regenerate a solid sorbent is directly related to the heat capacity of the material. However, for most materials suitable for carbon capture applications the heat capacity is not known, and thus the standard procedure is to assume the same value for all materials. In this work, we developed a machine-learning approach, trained on density functional theory simulations, to accurately predict the heat capacity of these materials, i.e., zeolites, metal-organic frameworks, and covalent-organic frameworks. The accuracy of our prediction is confirmed with experimental data. Finally, for a temperature swing adsorption process that captures carbon from the flue gas of a coal-fired power plant, we show that for some materials the heat requirement is reduced by as much as a factor of two using the correct heat capacity.


DFT Calculations
All the CP2K calculations are run using the Automated Interactive Infrastructure and Database for Computational Science, AiiDA, 1 and the workflows used in this work are shared within the aiida-lsmo plugin (https://github.com/lsmo-epfl/aiida-lsmo). For the cell and geometry optimization of the crystal structures, the protocol described in our previous work was employed: 2 the reader can find the associated AiiDA workflow named Cp2kMultistageWorkChain as part of the aiida-lsmo plugin. The difference with our previous work on COFs is that for MOFs we used our recently published machine learning algorithm to assigning the oxidation state of the metals 3 and infer the initial magnetization of the system. No initial magnetization was used for zeolites. In addition, for MOF-74 structures with open shell metals (Mn, Fe, and Co), we considered multiple spin configurations, and we noticed negligible effect on the heat capacity of the materials.
In addition, another AiiDA workflow, Cp2kPhonopyWorkChain, was coded to automate the generation of the 6N displacements of atoms using phonopy 4 and to submit the CP2K calculations for computing the atomic forces of each configuration, using the same DFT settings as the previous cell optimization. This workchain is also shared within aiida-lsmo. we have less than ∼ 2% negative frequencies per structure. Neglecting these ∼ 2% negative frequencies leads to ∼ 2% in estimating the heat capacity of the materials.
As the lattice parameters of the MOF crystals are typically large (see Figure S. 1a), the electronic structure is often computed at gamma point with no k-point sampling. Similarly, in our calculations using CP2K, we do not consider k-point sampling. Here, we investigate the effect of this simplification on the vibrational frequencies of a small set of MOFs. We conducted a convergence test using k-point sampling for the SCF(Self-Consistent Field), geometry optimization, and the vibration frequencies using CRYSTAL17 code. 5      In this study, we report the heat capacity for the materials extracted from the databases listed in Table 1. For IZA, CURATED-COFs, and QMOF databases, we report the heat capacity for all the structures. However, for CoRE-MOF database, we remove structures with atomic overlaps. Also, CURATED-MOFs denotes to a subset of structures from CoRE-MOF database that we performed a series of checks using MOF-checker algorithm 10 and subsequently DFT optimised.

Temperature dependent ML predictions
To predict the temperature dependence of the heat capacity of nanoporous materials, we trained machine learning models for a wide range of temperatures using the same procedure explained in the method section and above. Using these machine learning models, we predicted the heat capacity of (Zn and Co-)-MOF-74 and IRMOF-1 (MOF-5). We compare the predictions of the heat capacity with the DFT and experimental measurements, as they are shown in Figure S

Feature importance analysis
To assess how different characteristics of an atom influences the heat capacity predicted by the machine learning model, we performed feature importance analysis using a game theoretic approach, SHAP (SHapley Additive exPlanations). 17,18 In Figure S. 9, we show how the different aspects, namely atomic, local geometry, and local chemistry features, influence the heat capacity. The importance of the atomic features are known: as it was discussed in the main text, the vibrations are dependent on the atomic mass. However, this analysis clearly demonstrates that, in addition to these atomic features, local geometry and chemistry play a crucial role in the atomic vibrations (in total ∼ 40%).  Remarkably, for Mn, the role of local geometry is considerably higher than the other metals, e.g., Ni. Careful investigation of the features demonstrate that the main difference between Mn and Ni is related to the pairwise distances between the metal-linker atoms, that is captured by AGNI fingerprint. The relatively longer bonds of Mn vs. Ni is responsible for slightly weaker bonds and lower frequencies, which leads to a higher atomic Cp. These are the small detail that machine learning can capture efficiently to predict the heat capacity of the materials, when provided with sufficient data and expressive features.

Materials Synthesis and Characterization
The  For ZIF-8, curated CIF files originate from the initial report 21 of the ZIF family of MOF.
Ambient aqueous synthesis and corresponding literature TGA may be found in a later report. 22 For homochiral MOF Co 2 (L−asp) 2 (bpe), 23 Co 2 (L−asp) 2 (bpy), 23 Co 2 (S−mal) 2 (bpe), 24 Co 2 (S−mal) 2 (bpy) 25 and Zn 2 (D−cam) 2 (bpy), 26 all relevant CIF, adapted synthesis and literature TGA were found in the paper of their respective first report. For Cu(INA) 2 , the MOF was first reported without coordinated water, 27 then with coordinated water soon after. 28 The crystal structures for both are from in a later report. 29 Facile, scalable ambient pressure synthesis and corresponding TGA were found in a distinct publication. 30 For MOF-74 with zinc metal, closest synthetic method and corresponding TGA were taken from an article 31 that followed its initial report. 32 The literature CIF was adapted from a later study. 33 For MOF-74 with cobalt, all required data was found in original report, 34

Differential Scanning Calorimetry Measurements
Aluminium Differential Scanning Calorimetry (DSC) pans of 40 µl, with pin on bottom, were used with matching lids. The aluminium lids were manually pierced with a paper pin.
Dried MOF samples at sorption equilibrium with room temperature air were crushed in an agate mortar. The pan was tightly filled to the rim, and level with a spatula to remove powder from the rim surface. The pierced lids were placed on top in a manual press, and sealed to encapsulate the sample. The net sample weight was measured using the tare of the aluminium parts. The assembled pan and lid with the sample were weighed as well.
Before measuring samples an identically assembled empty lidded pan was used to blank the instrument. A sapphire standard of known heat capacity was then measured until con- The mass of the activated sample was then calculated and the heat capacity curves were renormalized to reflect that desorption was assumed to be complete for evaluated converged traces.
For samples Cu 2 (pdc) 2 (bpy) and CAU-7, an amended DSC measurement method was used for improve the time efficiency of data acquisition.

Assessment of the experimental protocol for Cp measurements
Literature data for solvated MOF was triaged for curves roughly covering our 20°C to 200°C measurement range, and excluding measurements where aluminium DSC pan lids were not pierced. This narrowed our scope to three structures, that also boasted predicted specific heat values based on group contribution type calculations. The much more general machine learning method presented in this paper was put to test to make predictions for these solvated structures, and gave estimates significantly lower than the independent experimental results.
Their reported values were also clear outliers when superimposed on a predicted specific heat to gravimetric density scatter plot of empty frameworks. As the temperature dependence of solvent coordination is not known in the three solvated structures, a contribution of translational as well as vibrational modes to the specific heat may be suggested for the discrepancy. Our experimental data on the specific heat of the empty frameworks was found showed the very pronounced desorption peaks of multiple events. After a run under vacuum, the third and fourth measurements revealed attenuating, though persistent, desorption peaks. The one at midrange is presumably due to resorption owing to incomplete purging, while the one at the higher end to incomplete desorption. After three more runs under vacuum, the eighth run showed an asessible curve for determining specific heat. The consecutively, back to back ran, ninth and tenth measurements matched tightly to establish the convergence of the curves. The fourth and eight curves were omitted from this plot for clarity.

Error estimation and propagation through experimental measurements
The heat capacity at a specific temperature may be written as in eq. 1, as the sum of the product of the corresponding specific heats and the material masses. The heat capacity expresses the change in enthalpy under isobaric conditions as shown in eq. 1 below.
Considering Fourier I. law of thermal conductivity in one dimension, we may obtain the eq. 2 to describe the change in enthalpy at isobaric conditions with a temperature difference and a constant, k, for thermal conductivity. Our experimental setup is closer to an isobaric than to an isochoric system.
Considering eq. 1 and eq. 2 we can express heat capacity with eq. 3 for the heat capacity by connecting the two with the rate of heating in the ramp phase of the DSC measurement.
Reflecting on eq. 1, eq. 2 and eq. 3 the eq. 4 terms below may be written. This holds true for all j samples, each containing i materials.
If we apply eq. 4 for the blank measurement we obtain eq. 5 for the heat capacity of the empty sample holder. If we apply eq. 4 for the sapphire calibration sample we get eq. 6 that now includes the specific heat of sapphire and the sapphire sample mass as well. In identical form to eq. 6, eq. 4 may be applied to all measured samples giving eq. 7 below.
Subtracting eq. 5 from eq. 6 we get eq. 8 for the sapphire standard, which may be simplified to yield eq. 9 below.
Similarly, subtracting eq. 5 from eq. 7 we get eq. 10 for the sample of interest, which may be simplified to yield eq. 11 below.
Dividing eq. 11 with eq. 9, the eq. 12 is obtained which may be simplified to eq. 13 below.
From eq. 13 the eq. 14 is obtained, which is the key to determine the heat capacity of studied samples with DSC, by pegging them to the sapphire standard.
Based on eq. 14 the eq. 15 for the propagation of error may be obtained.
The derivatives in the eq. 15 arise from the first member of the Taylor expansion. The partial derivative be the first measured variable takes the form in eq. 16 based on eq. 14 below. The partial derivative by the other measured variable is shown in eq. 17 based on expression in eq 14 as well.
Consecutive heat flow curves converged tightly upon convergence. As they may not be considered independent due to the undisturbed setup, their true variance may not be estimated with rigour. However, after assessing the propagation of the error stemming from the weight measurements, the latter may be considered as the dominant source of error. The eq. 18 arises similarly to eq. 15 and derivatives in eq. 18 are as shown in eq. 19 and eq. 20 below.
The eq. 29 simplifies to eq. 30 letting the variance of the sample mass be expressed with the variance of the weighing in eq. 31 below.
Substituting from eq. 31 and eq. 24 into eq. 23 the eq. 32 is obtained for the relative variance of the specific heat.
The eq. 32 may be written as eq. 33 or eq. 34 or eq. 35 or eq. 36 as shown.
Considering how obtained activated sample weights compare to the weight of the sapphire standard, we may give a higher estimate for the term shown in eq. 37 below.
Plugging the higher estimate for eq. 37 term back into the original eq. 36 it yields eq. 38, which in turn simplifies to eq. 39 as follows.
By plugging in the mass of the sapphire standard, eq. 40, and the variance of the weight measurement, eq. 41, for a Mettler Toledo XP205 balance, the relative variance is given a high estimate, in eq. 42 below.
σ m = 0.015mg (41) σ cp,s c p,s (T ) ≤0.0074 (42) Assessing this relative variance at 95 % confidence interval, assuming a standard normal distribution, the following, eq. 43, relative error interval arises. At p of 0.05 the z is 1.96 according to the distribution. 4 Process modelling and adsorption data

TSA process
In this study, we reproduced an equilibrium shortcut model developed by Ajenifuja et al. 38 We considered a standard 3-step cycle: (i) adsorption, (ii) open heating, and (iii) open cooling. A schematic representation of the process is given in Fig. 13. The process model is developed in python based on the following assumptions: • The inlet stream is an ideal fluid.
• All the components in the inlet stream are adsorbable.
• Co-adsorption can be described using the ideal adsorbed solution theory (IAST).
• The mass transfer resistance is negligible, i.e., equilibrium between fluid and adsorbed phase is reached instantaneously.
• A heat exchange fluid is used for heating and cooling, and its temperature is homogeneous along the column.
• The heat transfer resistance is negligible.
• The radial gradients, thermal dispersion, and axial mixing are negligible.
• There is no pressure drop across the bed.
For predicting multi-component adsorption equilibrium, we linked the process model to the pyIAST python package 39 which allowed us to calculate the co-adsorption behavior of the mixture using pure-component adsorption data.
For demonstration, we assumed that the stream is a binary mixture of CO 2 and N 2 and both components are adsorbable. We also assume that CO 2 is the strongly adsorbed component and, therefore, we screen only for materials that have a selectivity of CO 2 /N 2 greater than one. We also considered that CO 2 is captured from the flue gas stream of a coal-fired power plant and all assumed process, material and case-study related parameters are listed in Table 3.
In this study, we used the TSA equilibrium model to evaluate the process thermal energy requirement, working capacity, and purity of the product collected at step (ii). The process performance indicators are defined as: Here, ∆H sens and ∆H ads are the process total sensible heat and heat of adsorption requirements per TSA cycle. N product,i is the amount of species i leaving the column in step (ii) and m s is the mass of the solid. ∆H j sens and ∆H j ads at step j are governed by: Here T is the temperature of the column, c p denotes the specific heat capacity, N i denotes the amount of species i, ∆H i refers to the heat of adsorption of species i, and ∆N i is the difference in the amount of species i between step j and step (j − 1). The subscripts col, ads and s refer to the column, the adsorbed phase and the solid, respectively.

The role of water
Most, if not all, screening studies follow a stepwise approach, i.e. they start screening materials for dry flue gasses and the effect of contaminants like water is considered at a later stage.
After this initial screening under dry conditions, one can envision three different scenarios: Scenario A. For those top-performing materials under dry conditions, which lose their selectivity in the presence of water, drying of the flue gas will be required. Clearly, this drying step adds to the overall energy requirement. Therefore, in order to assess whether this scenario is worth pursuing, accurate knowledge of the heat capacity and heat requirement for a dry flue gas separation is essential.
Scenario B. The water uptake of the materials is calculated and no drying of the flue gas is required because the materials show a small water uptake and, more importantly, this uptake does not affect their selectivity for CO 2 . An example of such a material is Al-PMOF for which we estimated the contribution of adsorbed CO 2 , N 2 , and H 2 O to the total heat requirement. Under wet flue gas conditions, the amount of H 2 O that is adsorbed is relatively small and accounts for less than 1% of the heat requirement (see below for the details).
Scenario C. The water uptake of the materials is calculated, and the adsorbed amount is large. In this case, the heat capacity needs to be corrected for the adsorbed water, but these structures are of little relevance for carbon capture as their lack of selectivity and their poor energy efficiency render them unfeasible for CO2 separation.
Ultimately, a detailed techno-economic evaluation is required to identify which materials in scenarios A and B will be optimal for a CO 2 capture process. In these two scenarios, knowledge of the heat capacity of the material is essential, as it is a major contributor to the thermal energy requirement of the process. The use of an average or incorrect value for the heat capacity can significantly affect the calculation of the energy requirements for the process which, in turn, will translate into an overestimation of the costs for the carbon capture process.

The role of heating the adsorbates to the total heat requirement
It is important to estimate the contribution of heating the adsorbates to the total heat requirement. We take Al-PMOF as a prototypical adsorbent material for carbon capture applications in the presence of water. 42 The machine learning estimated heat capacity of Al-PMOF is 0.97J.g −1 .K −1 , and the experimental CO 2 and H 2 O uptake of the material at the flue gas conditions are 2 mmol/g and 0.1 mmol/g, respectively. Assuming a conservative scenario that all the adsorbates are heated up to the desorption temperature, we can use to estimate the ratio of the heat requirements. The required heating of the CO 2 and H 2 O accounts for less than 10% and 1% of the total heat requirement, respectively, taking the heat capacity of CO 2 and H 2 O being 0.84 J.g −1 .K −1 and 4.17 J.g −1 .K −1 , respectively. For the structures where the amount of adsorbed water is large, the contribution is not negligible.
However, these structures are of little relevance for carbon capture as their lack of selectivity and their poor energy efficiency render them unfeasible for CO 2 separation.

Adsorption data calculations
The CO 2 and N 2 pure component isotherms were calculated for the MOF structures. We assume frameworks to be rigid and only host-guest and guest-guest interactions are modelled using coulomb and Lennard-Jones type interactions. For the coulomb interactions, we use DDEC partial charges on the atoms of the materials and compute the interactions using Ewald summation. Furthermore, we use a combination of universal force field (UFF) 43 and TraPPE 44 force fields to model the host-guest interactions and TraPPE force field for the guest-guest interactions. The molecular interactions were cutoff at 12Å and the potential is tail-corrected. 45 Grand canonical Monte Carlo (GCMC) is used for computing the adsorption properties using 1000 cycles for initialisation and 10000 for production.
All the pure component isotherm calculations are run using the associated AiiDA workflow named IsothermAccurateWorkChain, which is part of the aiida-lsmo plugin. 2 This computational workflow was developed to ensure accurate sampling of material adsorption isotherms covering Henry's regime, the saturation plateau, and the curvature in between.
All the isotherms were computed at room temperature, and the Clausius-Clapeyron relation was used to extrapolate to other temperatures. Ideal Adsorbed Solution Theory (IAST) 46 was used to predict binary CO 2 and N 2 uptakes for the different process conditions. The calculations were done using a python package, Ideal adsorbed solution theory Python package (pyIAST) 39 that linearly interpolates IAST integrals using numerical quadrature.