An Explosophore-Based Approach Towards the Prediction of Energetic Material Sensitivity Properties

Accurate prediction of the sensitivity properties of high-energy materials (HEMs) and the study of their decomposition mechanisms are two major focuses within energetics research. Due to the hazards associated with the synthesis and handling of energetic materials, predictive models for HEM sensitivity are of great importance in enabling the safe and efficient development of future HEMs. Traditional predictive modeling of HEM decomposition via machine learning algorithms generally displays limited interpretability, while mechanistic studies of HEMs typically focus on small subsets of structurally analogous compounds lacking generalizability. This study aims to bridge the gap between predictive modeling and computational mechanistic analysis of HEMs, with the goal of providing chemically interpretable models for HEM sensitivity property prediction. Herein, we disclose the use of multivariate linear regression (MLR) modeling for the prediction of the decomposition temperature and impact sensitivity of HEMs. We report an explosophore-based approach to sensitivity property prediction featuring an ensemble of quantum mechanical parameters and computational workflows that enable rapid parameterization and modeling of energetic functional groups. We then employ these methods to accurately predict sensitivity properties of nitrogen-rich tetrazole and azide HEMs. These statistical MLR models are readily interpreted based on the principles of physical organic chemistry, producing structure-property relationships to guide the rational design of new HEMs. Furthermore, we extend our explosophore-based approach to predict the sensitivity properties of HEMs containing multiple, non-equivalent energetic functional groups through the identification of molecular triggers for the bulk decomposition of HEMs. Finally, we showcase the viability of our methods towards ab initio virtual screening of HEMs through predictive modeling of external test sets of tetrazole HEMs using structures and parameters generated exclusively in silico. ______________________________________________________________________________________________________________________________________________


Introduction
An important consideration in the design of energetic materials is their sensitivity to decomposition induced by external stimuli. The practicality of any high energy material (HEM) is immediately constrained by its ability to be safely synthesized, formulated, and stored without significant risks of deflagration or detonation. In contrast to numerous studies that aim to predict the performance properties (detonation velocity, detonation pressure, etc.) of HEMs, 1-3 analogous methods for the prediction of the sensitivity properties of HEMs (decomposition temperature (Tdec), impact sensitivity (IS), friction sensitivity, electric spark sensitivity) remain elusive. In the absence of robust predictive tools, the development of new energetic materials is often subject to time-consuming cycles of synthesis and characterization in which chemical insight is gained almost entirely through empirical trial-and-error. 4 As such, the ability to precisely tune the properties of HEM candidates based on their chemical structures offers the potential to accelerate the development of new HEMs for military and civilian applications. Streamlined methods for the ab initio design of HEMs can also minimize hazardous experimental work associated with the preparation of these materials. Benefits of such methods extend further to the synthesis of fine chemicals, 5 where energetic motifs such as nitrogen-containing heterocycles, azides, and nitro groups are commonly encountered in synthetic intermediates and in bioactive molecules. 6 Therefore, understanding the decomposition thresholds of these motifs is also a component of improved safety in process development chemistry.
Numerous experimental and computational studies have provided valuable insight into the decomposition of HEMs. [7][8][9][10][11] However, the narrow scope of such studies often results in limited applicability for the streamlined screening of large numbers of HEM candidates. Mechanistic studies of HEM decomposition are usually performed on datasets consisting of a single molecule or a narrow set of related molecules. 7,[12][13][14][15] Conclusions drawn from such studies can lead to poor generalizability when applied to structurally dissimilar compounds. Furthermore, the decomposition of energetic materials is often governed by multiple mechanisms with comparable activation energies, hindering rational design of HEMs based on a single mechanistic proposal. 16 Additionally, theoretical mechanistic studies involving excited states and transition state calculations remain a computationally expensive exercise, limiting their applicability for virtual screening of large numbers of structurally diverse compounds. Analogous challenges of limited generalizability and ballooning resource utilization are also encountered in other areas of chemistry such as asymmetric catalysis, organic methodology development, and battery research. These constraints are increasingly addressed by the implementation of data-driven approaches which aim to construct quantitative structure-property relationships (QSPRs) to aid prediction and guide rational design. In this vein, multivariate linear regression (MLR) statistical modeling is an attractive approach for developing QSPRs due to its inherent interpretability when used in conjunction with physically relevant molecular descriptors. 17 We sought to utilize an MLR-based approach to construct sensitivity property models which would reveal key mechanistic features in HEM decomposition for energetic material families at the forefront of current research.
Many commercial HEMs suffer from a combination of harsh nitration conditions in synthesis (RDX, HMX, TNT, CL-20), significant toxicity (RDX, Pb(N3)2, AP), or deleterious environmental impacts in storage and use. In light of these factors, nitrogen-rich explosives have garnered significant attention as replacements for canonical primary and secondary explosives. [18][19][20] However, a lack of reported structure-property relationships for nitrogen-rich functionalities such as organic azides and nitrogen-containing heterocycles hampers the rational design of new heterocyclic explosives and structural tuning of existing HEMs.
In order to elucidate these QSPRs, we employed an explosophore-based approach to describe and classify HEMs based on the presence of energetic functional groups known as explosophores ( Figure 1A), typical examples of which include nitro groups, nitramines, azides, and nitrogen-containing heterocycles. Explosophores are widely acknowledged to be responsible for the initial mechanistic steps leading to the deflagration or detonation of an HEM. 21 As such, detailed molecular description of explosophores is paramount in the development of chemically interpretable predictive models that can guide the design of next-generation HEMs. To enable this study, we compiled a highly diverse sensitivity property database containing over 400 HEMs reflecting prevailing trends in energetics research ( Figure 1B). 22 To capture the mechanistic importance of explosophores, we designed a general set of structural and quantum mechanical descriptors that can be applied to any type of explosophore, yielding a rich description of the chemical structures responsible for decomposition events. This descriptor ensemble includes natural bond orbital (NBO) analysis, Hirshfeld surface analysis, vibrational frequencies, charges, and the molecular geometry of explosophores. With this descriptor set in hand, we employed MLR modeling to develop chemically interpretable predictive models for the decomposition temperature and impact sensitivity of tetrazole and azide HEMs ( Figure 1C). Through several case studies, we demonstrate the versatility of our explosophore-based approach for accurate prediction of HEM sensitivity properties and identification of structure-property correlations useful in rational design, as well as for corroboration of proposed decomposition mechanisms.

Dataset Curation
We first compiled a sensitivity property dataset comprised of 386 energetic materials synthesized by the research groups of Klapötke and Shreeve. 23 The structural diversity contained within our dataset is extensive, featuring compounds representative of nearly every major explosophore class in numbers sufficiently large for the application of MLR modeling. For each of the HEMs in our dataset, sensitivity properties (i.e., Tdec and IS) and experimental crystal structures were obtained from literature sources. 24 Notably, HEMs containing metal centers and ionic structures with associated energetic counterions and oxidizers (perchlorate, guanidinium, etc.) were excluded as the presence of these auxiliary components substantially alters the sensitivity properties of HEMs, precluding efforts to study the decomposition mechanisms of explosophores on organic molecules. 25 Consistency between the methods used to collect experimental HEM sensitivity properties is also a requirement for the compilation of high-quality datasets for predictive modeling. Differential scanning calorimetry (DSC) is typically employed to measure the decomposition temperature of energetic materials, the precision of which is dependent on the rate at which a sample is heated. As such, our dataset contains compounds subjected to the same DSC procedure. 26 Impact sensitivity measurements are typically conducted using mechanical tests and are often not reported with a high degree of precision. 27 In our study, we transformed experimental IS measurements to a logarithmic scale (log(IS)). 28 The diversity of the compounds in our dataset required several simple yet important filtering steps prior to the generation of data subsets appropriate for MLR modeling. Perhaps intuitively, different explosophores impart different levels of sensitivity to a given HEM. Therefore, it is challenging to develop an interpretable MLR model using a training set that includes molecules spanning several different explosophore classes. To address this issue, we utilized Python scripts to sort and categorize HEMs using SMARTS queries and SMILES keys. 29 The result of this selection was the creation of a number of subsets (i.e., monoazides, polyazides, tetrazoles, ditetrazoles) that maintained structural diversity about the constraint of a common explosophore. Statistical models generated from these datasets could then be used to extract chemical insight characteristic of a specific explosophore class.

Parameter Acquisition
Utilization of available experimental crystal structures for the HEMs in our compound library was initially convenient for featurization efforts in lieu of structures calculated using molecular mechanics. HEM sensitivity properties are also influenced by the solid state structure of the crystal lattice. 30,31 As later discussions will illustrate, we also demonstrate that our methods can be used to model compounds lacking experimental crystal structures in order to support streamlined ab initio screening of HEM candidates.
With the 3D structures of our HEMs as input geometries, we conducted DFT optimizations (B3LYP/6-31+G(d,p)) and single point calculations (M06-2X/def2-TZVP) in the gas-phase. Extraction of a suite of geometric and electronic parameters for the explosophores in these HEMs ( Figure 2) was then conducted using automated computational tools (disclosed in the SI). These parameters included NBO charges, electrostatic potential (ESP) charges, predicted NMR shifts, NBO orbitals, Sterimol descriptors, vibrational frequencies, and a series of geometric descriptors such as bond angles, distances, and dihedral angles. Mathematical combinations of these parameters could provide additional parameters for evaluation such as bond orders and HOMO-LUMO gap energies (see SI for details). Lastly, parameters calculated from Hirshfeld isosurfaces were obtained to explore correlations between crystal packing and HEM sensitivity. 32

QSPR for Tetrazole HEMs
Workflow: Explosophores are the key reactive moieties responsible for the initiation of HEM deflagration and detonation. The viability of our explosophore-based approach to sensitivity prediction was initially explored using subsets of our larger dataset featuring HEMs containing only a single explosophore. Our workflow proceeds by first identifying an explosophore family of interest, followed by interrogation of a sensitivity property for prediction (Tdec or log(IS)). This selection process typically yielded datasets of 30-50 compounds suitable for MLR modeling. Of the HEMs in a selected explosophore class, compounds with multiple, nonequivalent instances of the explosophore are treated separately (vide infra). Once a dataset was curated, we then parameterized each molecule using our ensemble of chemical descriptors.
With experimental and descriptor datasets in hand, we applied MLR modeling to develop QSPRs for HEM sensitivity according to the general workflow described in Figure 3. First, an initial set of MLR models is generated for the sensitivity property of interest using forward stepwise regression. Multiple leading models are considered in terms of the number of parameters in the model, avoidance of cross-terms, and the recorded regression statistics (see SI for details). Notably, consideration of the univariate correlations between descriptors and sensitivity properties was a useful tool for model selection and the construction of additional descriptors to improve model performance. Secondly, univariate correlations can be explored to identify outlier compounds, i.e., compounds where the presence of a neighboring structure (e.g., acyl azides in an azide dataset) alters the properties of an explosophore in a manner that categorically differentiates it from the remainder of the test set. Inclusion of such compounds has the capacity to disproportionately influence the overall model. All suspected model outliers were verified using a one-tailed Grubbs outlier test at the 0.05 significance level. 33 When deemed appropriate, removal of such outliers was conducted and new models were generated to produce improved predictions. In the following sections, we apply this general workflow to generate accurate    and interpretable structure-property relationships for nitrogen-rich HEMs.
Tetrazoles: Tetrazoles are emerging as an increasingly important class of energetic materials due to their potential to replace nitrated compounds and lead-based primary explosives. 18,34 Unlike their canonical nitro counterparts, tetrazoles do not require forcing nitration conditions for their synthesis and pose lower environmental toxicity. 35 The decomposition mechanisms of tetrazoles have been previously investigated in a number of experimental and computational studies, with the prevailing decomposition mechanism being a stepwise extrusion of molecular nitrogen for both 1,5-and 2,5substituted tetrazoles ( Figure 4A and 4B). 7,8,[36][37][38][39] Among these studies, a number of dynamic isomerization events are observed for various tetrazole substituents (-H, -OR, -NHR), leading to a variety of possible decomposition pathways and rendering the development of generalizable reactivity trends challenging. Furthermore, the existence of two constitutional tetrazole isomers (i.e., 1,5-and 2,5-tetrazoles) requires separate mechanistic consideration in the context of a traditional mechanistic study, and the generality of trends observed in such studies would be challenged by the diversity of substituents encountered in tetrazole energetics.
To address these challenges using our explosophore-based approach on tetrazole HEMs, several considerations were made to adequately parameterize the structural and electronic properties of the tetrazoles in our study: 1) tetrazoles have two different substitution patterns, 2) tetrazoles can bear a plethora of substituents that significantly influence their energetic properties, and 3) many of the tetrazoles in our dataset contain multiple tetrazole explosophores. To adequately parametrize both tetrazole substitution patterns, we constructed descriptors based on a mechanistic analysis of reported decomposition mechanisms of 1,5-and 2,5-tetrazoles and devised a numbering scheme for shared bonds and common features for tetrazoles of either substitution pattern ( Figure 4A and 4B). 7,8,[36][37][38][39] Notably, several key features are maintained regardless of the tetrazole substitution pattern: the bond broken in a stepwise extrusion of molecular nitrogen is the single bond between N 2 and N 3 , a triple bond is formed between N 1 and N 2 in the liberation of nitrogen, N 3 substituted with R 1 and has a lone pair, and there is a double bond between C 4 and N 5 . Recognition of these similarities allowed for a universal description of the tetrazole system through the framework of our descriptor collection workflow. In the case of symmetric ditetrazoles, both tetrazole moieties in each molecule were treated as equivalent and thus only one of the two tetrazoles was parameterized for modeling. This assumption of symmetry is validated by the analysis of the experimental crystal structures of these symmetric ditetrazoles, revealing that ten out of eleven constitutionally symmetric ditetrazoles in our crystal structure database are also structurally symmetric in the solid state. With these considerations in mind, we curated and parameterized several distinct tetrazole datasets to develop QSPRs for the decomposition temperature and impact sensitivity of these HEMs.
Prediction of Tetrazole Sensitivity Properties from Crystal Structures: Parameters were collected for a total of 33 tetrazole HEMs including twenty-four 1,5-tetrazoles and nine 2,5-tetrazoles from crystal structures. Using forward stepwise linear regression, we identified a 3-term statistical model for tetrazole decomposition temperature ( Figure 5A). This model yielded predicted values of Tdec within 44 °C of the experimental value for 87% of all 31 tetrazoles in the dataset with reported Tdec values, with an MAE of 19 °C for the 27 tetrazoles comprising the training set. 40 The remaining four compounds that were predicted with larger errors are shown in Figure 5A and 5C (gray entries).
In this model, the term v(C 4 =N 5 ) is the normalized vibrational stretching frequency of the C 4 =N 5 double bond. We observed that this frequency is influenced by the substituent on the C 4 carbon, 41 efficiently accounting for inductive effects and conjugation with the surrounding π-orbitals of the tetrazole. The negative sign of this descriptor in the model indicates that electron-withdrawing substituents lead to a lower decomposition temperature. The R 1 -N 3 σ* occupancy descriptor is based on NBO analysis. As per the model equation, tetrazoles with a larger σ* occupancy have a lower decomposition temperature. The R 1 -N 3 σ* occupancy is controlled by the electronegativity of the substituent on nitrogen N 3 (H < C < N < O) and the tetrazole substitution pattern (1,5 < 2,5); the parameter correlates with the NBO charge on N 3 with a correlation coefficient of -0.78, demonstrating its relationship to the electronic properties of the ring substituent. Lastly, the max R B1 descriptor is a Sterimol parameter that quantifies the maximal B1 distance of either of the two ring substituents. 42 This statistical model is consistent with the main decomposition pathway proposed in literature, which features a rate-limiting cleavage of the N 2 -N 3 bond followed by nitrogen extrusion from the ring-opened intermediate ( Figure 4A and 4B). 7,8,[36][37][38][39] Electron-demanding substituents at the R 1 and R 2 positions stabilize the ensuing formal negative charge on the C 4 -N 5 motif and a sterically hindered R1 substituent favors the rupture of the ring near N 3 .
As a next step, we sought to validate our tetrazole models using the trigger hypothesis, which states that the bulk decomposition of an energetic material is dominated by the initial decomposition of the explosophore with the lowest activation energy prior to the initiation of an exothermic chain reaction. 43 We explored this idea in the context of the ditetrazole HEMs in our dataset: in cases where an HEM contained two non-identical tetrazole units, we hypothesized that the molecule's bulk Tdec would be closely related to the lower predicted Tdec of either of the two tetrazole explosophores. As such, we used our statistical model for Tdec ( Figure 5A) to generate predictions for an external test set of 12 non-symmetric ditetrazoles and selected the lower of the two predicted values as the predicted Tdec for the HEM. This model predicted nine non-symmetric ditetrazoles with a mean absolute error of 25 °C ( Figure 5A and 5C, green entries). This result serves as a good external validation for our original models and verifies the applicability of the trigger hypothesis for prediction of tetrazole decomposition temperature.
Training Set Training Set Outlier LOO CV Ditetrazole Test Set   Upon closer inspection of our models for Tdec, it became clear to us that several of the tetrazoles in the dataset possessed natural Lewis structures containing formal charges ("charged" NBO structures; highlighted in purple in Figure 6). The presence of these alternative Lewis structures had a significant impact on the accuracy of our Tdec predictions: while the MAE for tetrazoles with standard Lewis structures is 17 °C, the MAE for tetrazoles with "charged" Lewis structures is 46 °C. We thus sought to develop an expanded set of descriptors to better account for these electronic structures in future modeling efforts (vide infra).
The effects of charged tetrazole Lewis structures were also observed when we applied this workflow to the prediction of tetrazole impact sensitivity, which yielded models with poorer performance. Our best model for log(IS) using the crystal structure dataset ( Figure 5A) necessitated the removal of six compounds before satisfactory model statistics were obtained (R 2 = 0.86, Q 2 = 0. 83, MAE = 0.17, K-fold R 2 = 0.75). This can largely be attributed to the contributions of 9 out of 31 tetrazole HEMs with "charged" Lewis structures; while the 22 tetrazoles with "uncharged" Lewis structures were modeled with an MAE of 0.34, the remaining 9 compounds with "charged" NBO structures had an MAE of 0.47. While the MLR equation of the log(IS) model retained good interpretability ( Figure 7A), its shortcomings highlighted the necessity for construction of parameters that accounted for multiple natural Lewis structures.
Tetrazole Decomposition Temperature Prediction from Conformational Searches: Motivated by our initial success in identifying QSPR models for the decomposition temperature of structurally diverse tetrazole HEMs using descriptor sets constructed from crystallographic data, we sought to extend our methods to accommodate ab initio virtual screening of HEMs. Our reliance on crystal structure data would necessitate time-consuming and potentially hazardous laboratory work to test model predictions. To this end, we developed a second workflow for QSPR modeling of tetrazole sensitivity properties wherein 2D structural formulae were converted into 3D conformers using molecular mechanics with MacroModel version 11.8. All conformers within a 5.0 kcal/mol energy difference of the lowest energy conformer were subjected to the same DFT optimizations and single point calculations for parameter acquisition as described in the original workflow. With multiple conformations for each compound in hand, we generated two descriptor datasets for statistical applications: 1) the minimum energy conformer of each HEM according to DFT calculations and 2) Boltzmann averaging of descriptor values across all conformations.
As previously discussed, the multiple natural Lewis structures encountered within the tetrazole dataset ( Figure 6) prompted us to construct additional NBO parameters. Due to the aromaticity of tetrazoles, we started by constructing minima, maxima, sums, and averages of NBO orbital occupancies, energies, bond orders, and energy gaps for the calculated localized π NBOs. Importantly, multiple Lewis structures were unified by identifying NBOs that are related by resonance. For instance, the natural Lewis structure of some tetrazoles contained an N 2 =N 3 double bond ("charged" NBO structures) whereas in the majority of tetrazoles it is localized as a N 1 =N 2 double bond ("uncharged" NBO structures). Hence, these π orbitals were treated as a single structural feature for the design of additional descriptors. Closer inspection of the σ NBO energies and orbital occupancies of tetrazoles with "charged" NBO structures revealed systemic alterations based on the calculated Lewis structure of the tetrazole. Analogous feature engineering was thus also performed on σ NBO orbitals. Taken together, these improvements offered a more uniform description of the π-system and the σ skeleton of tetrazoles in a manner agnostic of their Lewis structures.
Statistical models were generated with this new descriptor set following the same general workflow outlined in Figure 3. Intriguingly, the Boltzmann-averaged descriptor dataset provided better statistical models for thermal decomposition, while the minimum energy conformer dataset proved more effective for prediction of the impact sensitivity (vide infra). This result is consistent with the observation that many of the tetrazoles in our experimental dataset have a melting point below their decomposition temperature. Consequently, these tetrazoles have a high degree of conformational flexibility prior to decomposition, which is suitably approximated by the Boltzmann averaging of the descriptor set. However, decomposition initiated by impact occurs in the solid state, in which HEMs generally adopt a single conformation in a crystal lattice. The statistical model for Tdec derived from the Boltzmann-averaged parameters ( Figure 5B) carries excellent statistical metrics, yielding predicted Tdec values within 30 °C of the experimental value for 90% of the 31 tetrazoles in the dataset, with an MAE of 16 °C for the 28 tetrazoles comprising the training set. The improved performance of this statistical model as compared to the model using descriptors derived from the crystal structure conformers is also attributed to the expanded set of NBO descriptors used in this second generation of modeling. This improvement is highlighted by the inclusion of the average N 1 -N 2 /N 2 -N 3 σ bond order descriptor, which was constructed to account for tetrazoles (3 out of 31) that have a double bond between N 2 and N 3 in their natural Lewis structures. Closer inspection of this descriptor reveals a correlation coefficient with the N 2 -N 3 σ bond order of 0.87, while the descriptor is not correlated with the N 1 -N 2 σ bond order (correlation coefficient = 0.04). The large positive model coefficient for the average N 1 -N 2 /N 2 -N 3 σ bond order in the MLR equation ( Figure 5B) supports a rate-determining N 2 -N 3 bond breakage in the decomposition of both tetrazole classes. The X=X* maximum occupancy descriptor is the highest occupancy in any π* orbital within the tetrazole ring, accounting for any tetrazole resonance structures. The third descriptor in the model is the average calculated NMR shift of all 4 nitrogen atoms in the ring.
To validate this model with an external test set, the decomposition temperatures of our set of 12 non-symmetric ditetrazoles were predicted. The ditetrazoles were parameterized in the same fashion as the training set using structures generated from conformational searches and subsequent DFT computations. As before, the lower predicted value between the two tetrazole moieties was adopted as the predicted property for the HEM in accordance with the trigger hypothesis. This resulted in good predictions for the decomposition temperature for 9 of the 12 ditetrazoles, giving a MAE of 16 °C. The total MAE for all 12 ditetrazoles was 31 °C.

Tetrazole
Impact Sensitivity Prediction from Conformational Searches: Tetrazole impact sensitivity was successfully modeled using the minimum energy conformer descriptor set ( Figure 7B). This model yielded log(IS) predictions for 87% of the 31 tetrazoles within 0.39 of the experimental value with a training set MAE of 0.17. As in the model for Tdec (Figure 5B), the parameter X=X* maximum occupancy appears with a positive coefficient in the model for log(IS). The ave R L parameter is a Sterimol parameter that describes the average length of both substituents on each tetrazole. Three of the four log(IS) outliers ( Figure 7B, gray points) contained a 5-amino-1H-tetrazol-1-ol subunit and were predicted with a higher impact sensitivity than the experimental value, suggesting that this motif may result in intermolecular interactions not captured in the conformational search model. This is further supported by the log(IS) model based on the crystal structure ( Figure 7A), which gave better predictions for all these three compounds ( Figure 5C). The improved performance of the crystal structure model may be attributed to the Hirshfeld parameter de mean describing the average intermolecular distance between molecules in the solid state. The inclusion of the de mean term in this model supports the notion that tighter crystal packing increases the sensitivity of tetrazole HEMs.
Prediction of the ditetrazole test set using this impact sensitivity model in conjunction with the trigger hypothesis failed to produce good predictions ( Figure 7B, green triangles). We propose two explanations for this outcome. Firstly, many of the ditetrazoles used in the external test set are fully conjugated. Our model for tetrazole impact sensitivity largely featured descriptors that describe the π-system of isolated tetrazole motifs. Consequently, the extended conjugation of the non-symmetric ditetrazole test set might produce cooperative effects that our model fails to explain. An alternative explanation is that the trigger hypothesis is simply not applicable for impact-initiated decomposition of tetrazoles. The investigation of this question is an area of ongoing research.

Performance
Improvements from NBO Feature Engineering: The influence of NBO feature engineering on model performance is illustrated through comparison between the performance of the best Tdec and log(IS) models generated using the initial parameter set with the performance of the best models developed with the extended parameter set. A useful metric for the ability of a descriptor set to account for "charged" NBO structures is the comparison of the MAE obtained for the dominant natural Lewis structure and that of the charged set ( Figure 8). Our expanded suite of descriptors substantially reduced the difference in MAE between these two groups of tetrazoles for both the prediction of Tdec (30 to -7 °C) and log(IS) (0.13 to -0.02).
Tetrazole Decomposition Temperature and Impact Sensitivity Prediction Using an Expanded Tetrazole Dataset: After demonstrating proof-of-principle for ab initio prediction of tetrazole Tdec and log(IS), we conducted a comprehensive literature survey using SciFinderⁿ for additional energetic tetrazoles disclosed by Klapötke and Shreeve that did not have reported crystal structure data. This allowed us to add an additional 24 structures to our library and produce an expanded dataset of 57 tetrazole HEMs (Figure 9). We again used molecular mechanics to obtain a series of conformers for each compound and then applied our   statistical modeling methods towards the prediction of Tdec (using a Boltzmann averaged dataset) and log(IS) (using a minimum energy conformer dataset). The 54 tetrazoles in this dataset with reported Tdec values were split into a 60:40 train:test set (33 tetrazoles in training set and 21 tetrazoles in test set) using the Kennard-Stone algorithm 44 to provide a reproducible uniform distribution of sensitivity property values for model development.
The training set of 33 tetrazoles resulted in a 3-term model for Tdec ( Figure 9B) with good statistical metrics. This model yielded predicted Tdec values within 41 °C of the experimental value for 93% of the tetrazoles in the training set with an MAE of 17 °C. Two compounds were excluded from the training set due to their outlier status as evidenced by large prediction errors (70 °C, 78 °C). This model was then validated with two external validation sets: the remaining test set of 21 tetrazoles and the test set of 12 non-symmetric ditetrazoles. Prediction of the 33 compounds in these validation sets resulted in an MAE of 25 °C (four outliers with prediction errors of 68 °C, 72 °C, 100 °C, and 142 °C). The experimental decomposition temperatures of tetrazoles used in this model span a range of over 200 °C, highlighting the utility of this predictive model.

QSPR for Azide HEMs
Prediction of Decomposition Temperature and Impact Sensitivity of Azide HEMs: Organic azides are of interest for their promise as replacements for toxic metal-based primary explosives. 18 In addition to their direct use as energetics, azides are ubiquitous handles in the preparation of heterocyclic HEMs, bioactive molecules, and materials. [45][46][47][48] However, to the best of our knowledge, no QSPRs have been reported describing the sensitivity properties of small molecule azide HEMs. To further extend the application of our methods for sensitivity property prediction, we turned our attention to the 35 azide-containing HEMs in our compound library ( Figure  12D). Many of these HEMs included multiple azide functional groups per molecule (10 of 35 HEMs, for a total of 54 azide explosophores) in addition to non-azide explosophores. As was the case with the tetrazole/ditetrazole datasets, we initially modeled azides containing only one azide explosophore, and then expanded the scope of our prediction to encompass HEMs with multiple azide explosophores.
During curation of the azide dataset, it became apparent that acyl azides possess Tdec values that are categorically lower than the Tdec of azides attached to other functionalities. The best Tdec model generated (vide infra for modeling workflow) for the full azide dataset (N = 34, R 2 = 0.75, Q 2 = 0.69) contained an appreciable univariate correlation (R 2 = 0.47) between Tdec and the σ*(C-N3) occupancy descriptor (Figure 9). For the full azide dataset, the correlation coefficient corresponding to this plot is given as -0.69; however, upon removal of the nine acyl, iminyl, and peroxyl azides, this correlation coefficient is given as +0.68. This categorical difference is easily visualized in Figure 10 and carries an intuitive interpretation. In analogy to a concave upward nonlinear Hammett plot, this indicates that these azides follow a different decomposition mechanism than the remainder of the azide HEMs. 49 This finding is in agreement with the literature, as acyl azides can readily undergo a Curtius rearrangement to furnish the corresponding isocyanate with Figure 10. Univariate correlation between the lowest predicted azide Tdec and the σ*(C-N3) occupancy descriptor. Categorical differences between acyl azides (red), iminyl azides (blue), peroxo azides (gray), and the remaining azide HEMs (black) are apparent. The linear fit corresponding to the entire azide dataset of 34 HEMs given by the red trace is driven by outlier azides. The linear fit corresponding to the remaining 26 azides is given by the black trace. Figure 11. An iterative approach to modeling HEMs with multiple explosophores. Based on the trigger hypothesis, each HEM is represented by one of its explosophores. After an initial phase of MLR modeling, the explosophore that is predicted to be most reactive (lowest Tdec or lowest log(IS)) in the N th iteration is taken forward for dataset of the next iteration N+1. During the MLR modeling steps, judicious removal of outliers may significantly improve the model performance. A quantitative metric for outlier selection is a decreased difference between R 2 and Q 2 upon removal of a suspected outlier HEM. the liberation of molecular nitrogen. 50 This concerted reaction mechanism is not accessible to other classes of azides, which most likely decompose via transient nitrene species and the liberation of molecular nitrogen. 44 As such, models generated including these azides are a useful tool in mechanistic classification but fail to provide the detailed chemical interpretability we desired. This mechanistic divergence prompted us to exclude the nine acyl, peroxo, and iminyl azides in our dataset from further Tdec modeling efforts.
After this initial exploration, we systematically developed models for Tdec for the remaining 26 azide HEMs in our study. We first generated an initial training model using the 19 HEMs containing only a single azide explosophore ( Figure 11, Step 1) see SI for model equation and statistics). We then applied an iterative workflow to expand the scope of our prediction to the remaining 7 HEMs with multiple azide explosophores per molecule. This iterative process is described as follows and is demonstrated in detail in the SI. First, the training model for HEMs with one azide explosophore was used to generate Tdec predictions for each azide explosophore in every HEM in the dataset ( Figure 11, Step 2). For HEMs with multiple azide explosophores, the azide group predicted to be the most sensitive (i.e., lowest predicted Tdec) is added to the training dataset for a subsequent generation of modeling ( Figure 11, Step 3). At this point, all HEMs were now included in this updated training set. A new generation of models was produced from this training set, and the azide explosophore in each molecule with the lowest predicted Tdec was again compiled into a new dataset. This process was repeated until dataset convergence was reached, i.e., once explosophore selection for every HEM in the dataset was the same as in the previous iteration of modeling ( Figure 11, Step 4). In our experience, convergence was achieved within 2-4 iterations of T dec / °C = 157 + 10 σ* C-N occ. + 11 π* 3,N≡N energy -16 d(C↔N) Natural Lewis Structure for 47/54 Azides (30/35 HEMs) this workflow. After three iterations of modeling, our workflow had converged on a final training set and a corresponding predictive model for azide decomposition temperature ( Figure  12C). 51 Figure  12D). Notably, acyl, iminyl, and peroxo azides were successfully predicted in the final model.
Analysis of calculated NBO orbitals for the azide dataset reveals that most azides share a common natural Lewis structure ( Figure 12B). The models for Tdec and log(IS) can both be readily understood and share analogous parameters. Both statistical models contain σ*(C-N3) occupancy descriptor; the coefficient for this descriptor is positive for Tdec and negative for log(IS). This parameter serves as a good readout for the inductive effects of substituents on the carbon of the azide group. Furthermore, both models contain a parameter that describes the NBO LUMO orbital of the N 2 ≡N 3 triple bond. The Tdec model indicates that azides with lower N 2 ≡N 3 NBO LUMO energies are more sensitive, and the log(IS) model suggests that compounds with a smaller N 2 ≡N 3 NBO HOMO/LUMO energy gap are more sensitive. These observations provide a consistent mechanistic picture of azide decomposition via extrusion of molecular nitrogen. Lastly, both models contain a parameter that describes the geometry/steric environment of the azide. In our model for Tdec, azides with a longer azide carbon-terminal nitrogen are predicted to have a lower Tdec. In our model for log(IS), azides with bulkier substituents are more sensitive.
Notably, very low mean absolute errors were achieved for both Tdec and log(IS) (7.3 °C and 0.13, respectively), testifying to the accuracy of this method for predictive modeling of azidecontaining HEMs. Good statistical performance was achieved with removal of very few outliers, despite the diversity of the dataset with regards to the presence of various additional explosophores. Furthermore, the statistical models were constructed from descriptors unique to the azide moiety and were thus largely agnostic of other explosophores contained in the molecule. This corroborates that the azides likely serve as triggers for explosive decomposition, even in the presence of other explosophores.

Conclusion
In summary, we report a series of computational and statistical methods that enabled prediction of decomposition temperature and impact sensitivity across a structurally diverse scope of tetrazole and azide HEMs. The key conceptual innovation in this approach is the decision to interrogate the structural and quantum mechanical properties of HEMs at the level of their explosophores, providing the ability to describe and interpret mechanistic events leading to decomposition while also modeling sensitivity properties. The advantages of this descriptor suite are compounded by the fact that descriptors are localized at the level of individual chemical bonds but can readily be expanded to larger chemical motifs, allowing a common set of parameters to be generated for an explosophore of user-specified structure and a sensitivity property of interest. A comprehensive account of the computational methods used to construct these models is disclosed in the SI.
To the best of our knowledge, this work contains the first QSPRs for the sensitivity properties of tetrazole and azide HEMs in the reported literature. The models disclosed herein contain interpretable terms such as charges, NBO orbitals, and steric parameters, thereby offering chemical insight to guide the development of new HEMs with improved sensitivity properties. In contrast to literature precedence, the models in this work were trained and tested on new HEM datasets with high structural diversity, demonstrating the generality of our methods for chemical featurization of HEMs. Internal and external validation of our models further supports their validity as useful heuristics for rational design and mechanistic study.
The marriage of our explosophore-based lens for sensitivity property prediction with the trigger hypothesis subsequently allowed us to predict sensitivity properties for HEMs of increased structural complexity. Accurate sensitivity property predictions for HEMs containing multiple azide and tetrazole explosophores were obtained through simultaneous modeling of each of the explosophores in these HEMs, an ability unique to our explosophore-based approach. Successful modeling of sensitivity properties through this workflow lends support to the notion that specific explosophores act as molecular triggers for the bulk decomposition of HEMs. However, our efforts to predict the impact sensitivity of conjugated non-symmetric ditetrazoles in accordance with the trigger hypothesis proved unsuccessful, highlighting a limitation of this approach. Impact sensitivity predictions for these molecules may require the construction of additional descriptors that better capture the electronic structures characteristic of these HEMs.
Finally, we demonstrated the application of our methods towards streamlined virtual screening of HEMs. In contrast to our earlier models that utilized parameters calculated from experimental crystal structures, we successfully predicted the sensitivity properties of tetrazoles whose 3D structures were obtained from 2D structural formulae alone. Future expansions of this computational workflow to other compound classes, while not demonstrated in this work, have the potential to enable virtual screening of various other HEM families to improve the identification of viable synthetic targets prior to any experimental work. Derivatization of our workflow to include HEMs with ionic structures or even composite energetic formulations would further support the rational design of future energetic materials.