Extensive High-Accuracy Thermochemistry and Group Additivity Values for Halocarbon Combustion Modeling

Standard enthalpies, entropies, and heat capacities are calculated for 16,813 halocarbons using an automated high-fidelity thermochemistry workflow. This workflow generates conformers at density functional tight binding (DFTB) level, optimizes geometries, calculates harmonic frequencies, and performs 1D hindered rotor scans at DFT level, and computes electronic energies at G4 level. The computed enthalpies of formation for 400 molecules show good agreement with literature references, but the majority of the calculated species have no reference in the literature. Thus, this work presents the most accurate thermochemistry for many halocarbons to date. This new data set is used to train an extensive ensemble of group additivity values and hydrogen bond increment groups within the Reaction Mechanism Generator (RMG) framework. On average, the new group values estimate standard enthalpies for halogenated hydrocarbons within 3 kcal/mol of their G4 values. A significant contribution towards automated mechanism generation of halocarbon combustion, this research provides thermochemical data for thousands of novel halogenated species and presents a self-consistent set of halogen group additivity values.


Introduction Halogenated Hydrocarbons (HHCs)
Halogenated hydrocarbons (HHCs) are commonly used as flame suppressants and refrigerant working fluids. The first generation of these compounds, chlorofluorocarbons (CFCs) and hydrochlorofluorocarbons (HCFCs), depleted the ozone layer and were banned worldwide under the Montreal Protocol in the 1980s. 1 The second generation, hydrofluorocarbons (HFCs), are ozone-friendly but are currently being phased out due to their high global warming potentials (GWPs). 2 Despite these controls on high-GWP HFC production, a recent study discovered that emissions of HFC-23 (CHF 3 ), a potent greenhouse gas, reached an historic high in 2018. 3 To address these environmental concerns, several low-GWP HHC refrigerants and suppressants have been proposed. However, the chemical properties that make these HFCs more environmentally friendly also increase their flammability. 4 Therefore, the combustion properties of these proposed HHCs are of the utmost concern. Since experimental studies of these properties are complex and costly, predictive kinetic modeling of HHC combustion is crucial in screening proposed compounds in order to facilitate their innovation and implementation.
Understanding the complex chemistry of new compounds and predicting their combustion behavior under different conditions requires the compilation and simulation of detailed kinetic mechanisms (or microkinetic models) which often contain thousands of elementary reactions among hundreds of intermediate species. Building these models by hand is extremely challenging and error-prone due to the vast number of possible species and reactions to consider, sparse thermokinetic data available in the scientific literature, and biases of the human choosing which pathways to select. Thus, a tool that generates these models automatically by enumerating and evaluating the many potential pathways by which HHCs combust will be instrumental in screening the flammabilty of greener refrigerants and suppressants.

Reaction Mechanism Generator
Reaction Mechanism Generator (RMG) is an open-source software package that automatically builds detailed kinetic models by proposing elementary reactions and estimating chemical properties (physical, thermochemical, kinetic, solvation, etc.) using a database of reaction templates, thermokinetic data, and estimation methods. 5,6 These chemical properties are first sought in a database of known parameters, but are more commonly estimated using hierarchical decision trees. Thermochemical parameters

Thermochemistry for HHCs
As kinetic modeling of HHCs has been been studied for many decades, thermochemical data for these molecules and their derivatives have accumulated through various theoretical in-

Methods
The enum-halocarb4 dataset Due to a scarcity of thermochemical data for HHCs in the literature, a new dataset, enum-halocarb4, was compiled in this work. In order to obtain high coverage and diversity of CHO-(F,Cl,Br) chemical space, this dataset was created by "halogenating" a systematically enumerated set of over 600 CHO species containing up to 4 heavy atoms generated by Margraf et al. 31 This "halogenating" process involved systematic substitutions of halogen  atoms (F,Cl,Br) for hydrogens using RDKit. 32 Molecules that contain more than one halogen element (CF 3 Br for example) are referred to as "mixed-halogens" and molecules with only a single halogen element (CF 4 for example) are called "mono-halogens". To reduce the size of the dataset, enum-halocarb4 was pruned by removing: • radical species with more than one unpaired electron • mixed-halogens with more than 8 heavy atoms • cyclic mixed-halogens The chemical composition of the 16,813 molecules in the enum-halocarb4 is shown in Table 2.

Thermochemistry Workflow
The automated thermochemistry workflow used to calculate high-level thermochemical pa- Figure 1.
First, a SMILES representation of the molecule is used to generate a molecular graph of the species using RMG. Then, the molecule is embedded with RDKit 32 to create a 3D geometry.
After embedding, conformers are investigated using the systematic conformer generation algorithm implemented in AutoTST. 33

Group Additivity Values
The thermochemical data in the enum-halocarb4 dataset was used as training data to fit 2,041 new halogen thermo group addivitity values (GAVs) in RMG. Five types of thermo additivity groups were derived: conventional nearest-neighbor Benson groups, hydrogen bond increment groups (HBI), cyclic and noncyclic non-next-nearest neighbor interaction groups (long-distance), and ring correction groups. Table 4 shows the number of groups derived for each group type, and Table 3 demonstrates how these groups combine to estimate the enthalpy of formation for a hydrofluorocarbon radical CH 3 CH 2 CHFCF 2 calculated in enum-halocarb4.  HBIs. To improve the accuracy of the HBI scheme, 1025 second nearest-neighbor HBIs and 320 three and four member ring HBIs were derived using 6,064 noncyclic radicals and 717 cyclic radicals, respectively.
Cyclic groups were also included to account for ring strain in halogen-substituted rings.
247 cyclic corrections and 56 long-distance cyclic interactions were derived from 796 closedshell three and four member ring species in enum-halocarb4. No corrections were included for ring sizes greater than four atoms since the maximum ring size in enum-halocarb4 is four.

Test Set Generation
To assess the accuracy of RMG's new halogen thermo groups in estimating thermochemistry of intermediates created during automated generation of HHC combustion models, an RMG model was constructed for 2-Bromo-3,3,3-trifluoropropene (2-BTP) and CF 3 Br in methane flames. Before generating a model, a literature mechanism for 2-BTP from NIST 45 was imported into RMG. In order to teach RMG how these two flame suppressants behave in hydrocarbon flames, 727 of the 1,610 reactions in the literature mechanism were added as training reactions to RMG's reaction families. Adding these reactions into training helps RMG generalize and improves rate estimates of reactions with similar functional groups.
Then, an RMG model was built using the Foundational Fuel Chemistry Model Version 1.0 46 in RMG-database as a seed mechanism. enum-halocarb4 was used as an RMG thermo library during 2-BTP model generation, should RMG need thermochemistry for an intermediate in that dataset; other HHCs were estimated using the new GAV scheme. Our description is brief because the goal here is not to produce a better kinetic model, but to generate a test-set of molecules from a realistic scenario of automated model generation with RMG. 104 HHCs from the resulting RMG mechanism that were not in the enum-halocarb4 GAV training set were recalculated at G4 level using the automated thermochemisty workflow previously discussed, and the calculated thermochemical properties were compared to GAV estimates.

Results and discussion
Thermochemistry Workflow Benchmark To evaluate the accuracy of our automated workflow in calculating ∆ f H • 298K using the G4 method, 400 enum-halocarb4 species were benchmarked against literature data. Figure   2 shows the distribution of the error for the benchmark set compared to reference ∆ f H •

298K
from the Active Thermochemical Tables (ATcT) 47 and various literature sources. [13][14][15][16][17][18][22][23][24][27][28][29][30] With a mean absolute error (0.83 kcal/mol) within chemical accuracy (≤ 1 kcal/mol), G4 is a suitable, relatively low cost composite quantum chemistry method for high-fidelity and high-throughput calculations of HHCs. However, for heavily halogenated systems, G4 and other composite methods do not compute enthalpies within chemical accuracy. 48 Calculated G4 enthalpies for C 2 Cl 5 and C 2 Cl 6 in enum-halocarb4 are more than 3 kcal/mol lower than ATcT values. It is likely that heavily halogenated molecules in enum-halocarb4 have higher errors than their more sparsely halogenated counterparts. To more accurately compute thermochemistry for heavily halogenated molecules, error-cancelling reactions, such as isodesmic reactions, could be used in place of the atomization approach used in this work. However, incorporating automated error-canceling reaction generation within our workflow was beyond the scope of the present work.

Group Additivity Values Performance
The new GAVs derived in this work dramatically improve RMG's estimates of ∆ f H • 298K , To further examine their accuracy and fidelity, the GAV performance with the new groups was evaluated for four different classes of molecules within the enum-halocarb4 dataset: noncyclic closed-shell, noncyclic radical, cyclic closed-shell, and cyclic radical. Figure 4 shows the GAV performance for each molecule class. For noncyclic HHCs, which are a majority of the dataset (∼ 88%), the GAVs are accurate at estimating thermochemical parameters for both closed-shell molecules and radicals. For non-cyclic closed-shell molecules, the MAD for  This disparity is most likely a result of only including closed-shell cyclic species in the derivation of the cyclic corrections and long-distance cyclic interaction groups. Therefore, since the radical cyclics are a test set for these groups whereas the closed-shell cyclics were included in training, it follows that the GAVs perform worse for the cyclic radicals.
Long-distance groups are included in the new GAV scheme to account for halogen/halogen and halogen/oxygen interactions on adjacent carbons. This modified group additivity approach was shown to significantly reduce errors in estimates of fluorinated and chlorinated hydrocarbons. 12,19 To investigate their impact in this work, the long-distance interactions were removed from the ensemble of GAVs, and estimated ∆ f H • 298K without these groups were compared to estimates with the groups included for molecules in enum-halocarb4 that have halogens on adjacent carbons. Figure 5 shows that, without the contributions from with LD without LD Figure 5: GAV enthalpy of formation estimates with long-distance groups (with LD in blue) and without long-distance interaction groups (without LD in red). Without long-distance groups, GAVs systematically underpredict enthalpies for HHCs.
long-distance interactions, the GAVs systematically underestimate the ∆ f H • 298K of these molecules by over 5 kcal/mol on average. In other words, GAVs overpredict stability of HHCs if lacking a long-distance term to capture the destabilizing interaction between halogens or oxygens on adjacent carbons. This indicates that these long-distance interactions are essential to accurately predict thermochemical properties of HHCs using a group additive scheme.

RMG Test Set
As automated mechanism generation is the intended application, the new halogen GAVs were implemented in RMG, and RMG was used to construct a test set. Since RMG explores a wide variety of molecules as it builds a mechanism, RMG molecules provide a challenging test for GAVs and other graph-based molecular property estimators. The performance of the GAVs for the 2-BTP RMG test set is shown in Figure 6. Expectedly, the GAVs showed poorer performance for estimating ∆ f H • 298K on the RMG test set with an overall training set. The GAVs performed well for noncyclic test set molecules with mean absolute deviations of 2.85 kcal/mol for closed-shell species and 4.19 kcal/mol for radicals. However, the GAVs performed relatively poorly for cyclic compounds, with mean absolute deviations of 6.89 kcal/mol for closed-shell molecules and 6.44 kcal/mol for cyclic radicals. The poorer performance for the cyclic species can mainly be attributed to the following three factors.
First, due to a lack of training data, the GAVs do not include halogen-specific ring corrections for ring sizes greater than 4 atoms. The mean absolute deviation for cyclic species with ring sizes greater than 4 was 7.42 kcal/mol compared to 6.09 kcal/mol for three and four member rings. Second, the long-distance corrections in RMG are not applied for neighboring atoms if one atom is in a ring and the other atom is outside the ring. A missing long-distance interaction between halogens on atoms in and out of a ring would lead to systematic underprediciton of ∆ f H • 298K for these types of molecules, which may be reflected in the negative mean signed deviation for cyclic compounds in the test set. Third, because RMG was used to build the test set and RMG prefers to incorporate low energy molecules in its models (endothermic reactions are typically slower), there is a bias for underestimated (i.e. lower energy) molecules to be selected for the test set. In other words, it is more likely for molecules that the GAVs underestimated (negative MSD) to end up in the test set than overestimated or correctly estimated molecules. Although the current GAV estimates do not perform as well for cyclic halocarbons as noncyclic ones, the estimates are good enough to be helpful in the automated construction of microkinetic models, and are a vast improvement on RMG's estimates before this work.

Conclusions
This research provides thermochemical data for thousands of novel halogenated species and presents a comprehensive, self-consistent set of halogen group additivity values within the Reaction Mechanism Generator framework. The new GAVs accurately estimate enthalpies of formation for noncyclic closed-shell and radical species in enum-halocarb4 and an RMG test set, but show poorer performance for rings for which more thermochemical training data are needed. Overall, the new halogen GAVs substantially improve RMG's thermo estimates for halocarbons, reducing the mean-absolute deviation of ∆ f H • 298K for the enum-halocarb4 dataset from 66 to 2.21 kcal/mol. Importantly, these new groups will enable rapid and accurate on-the-fly estimation of halocarbon thermochemistry during automated model generation, thereby improving the fidelity and reliability of RMG's halocarbon combustion models for next-generation eco-friendly refrigerants and flame suppressants. This work also contributes a new data set, enum-halocarb4, which provides essential thermochemical data in a sparsely populated region of chemistry for training other machine learning estimation methods.