New Insights into the Crystallographic Disorder in the Polymorphic 1 Forms of Aspirin from Low-frequency Vibrational Analysis

Terahertz time-domain spectroscopy (THz-TDS) is applied to two polymorphs of acetylsalicylic acid (aspirin), and the experimental spectra are compared to lattice dynamical calculations using high accuracy density functional theory. The calculations confirm that forms I and II have very close energetic and thermodynamic properties and also that they show similar spectral features in the far-infrared region, reflecting the high degree of similarity in their crystal structures. Unique vibrational modes are identified for each polymorph which allow them to be distinguished using THz-TDS measurements. The observation of spectral features attributable to both polymorphic forms in a single sample, however, provides further evidence to support the hypothesis that crystalline aspirin typically comprises intergrown domains of forms I and II. Differences observed in the baseline of the measured THz-TDS spectra indicate a greater degree of structural disorder in the samples of form II. Calculated Gibbs free-energy curves show a turning point at 75 K, inferring that form II is expected to be more stable than form I above this temperature as a result of its greater vibrational entropy. The calculations do not account for any differences in configurational entropy that may arise from expected structural defects. Further computational work on these structures, such as ab initio molecular dynamics, would be very useful to further explore this perspective. Here, aspirin is a model system to show how the additional insight from the low-frequency vibrational information complements the structural data and allows for quantitative thermodynamic information of pharmaceutical polymorphs to be extracted. The methodology is directly applicable to other polymorphic systems.


Introduction 37
The crystal structure of one of the most well-known analgesics, acetylsalicylic acid, commonly 38 referred to under its trade name aspirin, was first characterised by Wheatley in 1964. 1 After the 39 initial crystal structure was determined, it was widely discussed whether other crystal forms of 40 aspirin may exist, but it was not until 2005 that Vishweshwar et al. reported the crystal structure 41 of aspirin form II. 2 The form II structure had been identified as an energy minimum by Ouvrard 42 and Price 3 in a crystal-structure prediction exercise, but sub-optimal features of the experimental 43 structure determination left doubts over the validity of the crystal structure and the existence of 44 form II as a discrete polymorph. 4 The origin of these difficulties was explained by Bond et al.,45 who showed that aspirin forms I and II exist as intergrown single crystals, comprising domains 46 of both structure types. 5 Reproducible crystallisation of form II remained challenging, but it was 47 shown in 2010 that domains of the second polymorph of aspirin are introduced reliably in the 48 presence of aspirin anhydride during the crystallisation process. 6 More polymorphs of aspirin 49 have since been discovered. 7 50 In this paper, we focus on the closely-related aspirin polymorphs, form I and form II. 51 There are considerable similarities in their respective crystal structures, which permit the 52 observed intergrowth behaviour. Figure 1 shows the projections of the two polymorphs along the 53 a, b and c axis. The structures are polytypes 8 , having consistent layers lying in the bc planes. For 54 the two forms, the difference is seen in the arrangement of these layers along the a axis. The 55 relationship between the structures can be viewed as a shift of neighbouring layers by half a unit-56 cell length along the c axis (as can be seen in the a axis projection), or alternatively as a 57 reflection of every second layer perpendicular to the b axis. Given the observed intergrowth 58 behaviour, it is evident that the two arrangements must have closely comparable energies. 59 Computational work using density functional theory (DFT) has indicated that the two forms 60 exhibit only a subtle energetic difference, and it was suggested that the two polymorphs were 61 virtually degenerate. 9 Later on, Reilly and Tkatchenko applied a many-body dispersion approach 62 with DFT to investigate their energetic differences at room temperature, which inferred that the 63 stability of form I was due to the coupling of low-frequency phonon modes and collective 64 electronic fluctuations. 10

Terahertz time-domain spectroscopy (THz-TDS) 127
The THz-TDS measurements were performed in transmission using a commercial spectrometer 128 (TeraPulse 4000, TeraView Ltd., Cambridge, UK) equipped with a cryostat (Janis, 129 Massachusetts, USA) and an attached temperature controller (Lakeshore 330, Ohio, USA). The 130 set up was used to acquire measurements in the temperature range between 80 and 400 K. 131 Due to the strong absorption from pure crystals, polycrystalline aspirin powder was diluted in 132 polyethylene (PE) powder (Induchem, Volketwil, Switzerland) to a 5 % w/w concentration using 133 an agate mortar and a pestle by gentle mixing. The powder mixture was then compressed into a 134 pellet of 13 mm diameter using a hydraulic press (Specac Ltd., Kent, UK) at a load of 2 tons, and 135 a blank PE pellet was prepared in the same way to be used as a reference. For each sample, 1000 136 waveforms were acquired and averaged with a spectral resolution of 0.94 cm -1 . A spectrum was 137 first acquired at room temperature to check if the sample pellet was at the proper concentration. 138 The whole system was then cooled to 80K with liquid nitrogen to acquire a spectrum at low 139 temperature that would be more comparable with the simulated spectrum from the DFT 140 calculations. 141

Density functional theory calculations 142
The DFT simulations were performed using the CRYSTAL17 software package. 19 The published 143 crystallographic data from the Cambridge Structural Database (CSD) (ACSALA07; ACSALA17) 144 were used to set the initial atomic positions and lattice parameters. 13,20,21 The subtle differences 145 between the two polymorphs required very strict parameters for the best differentiation. The 146 "extra extra large" grid (XXLGRID) was used to generate 100 k-points in reciprocal space 147 (SHRINK=7) and the tolerance of bielectronic coulomb and Hartree-Fock (HF) exchange 148 integrals were set to 10 -12 , 10 -12 , 10 -12 , 10 -12 and 10 -24 (TOLINTEG). The atomic centered triple-149 Ahlrichs' VTZ basis set (with polarisation) 22 and the PBE density functional 23 with Grimme-D3 150 London dispersion correction 24,25 were applied with a three body Axilrod-Teller-Muto repulsion 151 term (ABC) 26 . The energy convergence criterion for the geometry optimisation was set at 10 -8 152 Hartree, and 10 -10 Hartree for the frequency analysis. The default DIIS convergence accelerator 153 was applied. In addition, two displacements for each atom along each Cartesian direction were 154 calculated for the phonon modes (NUMDERIV=2) and the IR intensities were calculated based 155 on the Berry phase method. 27,28 The cohesive energies of both forms were calculated by 156 subtracting the isolated molecular energy from the total solid-state energy considering the basis 157 set superposition error (BSSE), which originates from the finite basis sets. 29 Select normal modes 158 were selected and the corresponding geometries were scanned along the eigenvectors to obtain 159 more information on these modes (SCANMODE). The Gibbs free energy as well as other 160 thermodynamic properties were calculated from 7 K to 300 K for each structure.

X-ray powder diffraction (XRPD) 164
The bulk solids from the crystallisation trials contained differing degrees of form II domains, as 165 indicated by the relative intensities of the signature form II peaks in the XRPD patterns ( Figure  166 S2). Trace aspirin anhydride was also detected in most samples. The variability in the 167 crystallization outcome is attributed principally to (deliberate) inhomogenous mixing of the 168 initial aspirin/aspirin anhydride solid sample. Three samples were selected for further analysis, 169 which contained minimum quantities of aspirin anhydride and a range of form II peak intensities. 170 The sample referred to as "form I" shows no evidence of form II peaks in the XRPD, "low purity 171 form II" shows form II peaks of intermediate intensity, and "high purity form II" shows the most 172 intense form II peaks ( Figure 2). Here, "purity" refers to the polymorphic composition. Hence, 173 the form I sample straightforwardly represents the aspirin form I structure, the high purity form 174 II sample is the best available representative of the form II structure, and the low purity form II 175 sample comprises the form II structure with a more substantial fraction of form I domains.

Terahertz time-domain spectroscopy (THz-TDS) 179
The maximum absorption coefficient (αmax) was determined by the dynamic range of the 180 terahertz signal. 30 It was checked qualitatively for the experimental results to make sure that the 181 absorption was within the accessible range. Figure S3 shows that the absorption did not exceed 182 the dynamic range of the instrument for frequencies up to 3 THz. Therefore, the spectra obtained 183 for the two polymorphs were valid for further analysis. 184 Figure 3 shows the experimental spectra of form I and the two form II samples at 80 K. The 185 overall resemblance of all parts of the spectra reflects the close structural similarity between the 186 two forms, which is not generally common in other polymorphic systems. In order to facilitate 187 comparison, the spectra were normalised relative to the peak at 1.8 THz, which is a pronounced 188

Optimisation of the crystal structures 199
The experimental crystal structures of both polymorphic forms were optimised using the settings 200 described in the Experimental section and compared with their published experimental data at 201 100 K and 300 K respectively (as shown in Table S1 and S2). The relative errors between 202 experimental and optimised structure are small, demonstrating the accuracy of the calculations. 203 As expected, the calculated physical properties of the two polymorphs (Table 1) show only 204 marginal differences. The total lattice energy of form II is calculated to be 0.293 kJ mol -1 higher 205 than form I, which implies that form I is marginally the more energetically stable state. The 206 energy difference increases to 1.609 kJ mol -1 when considering also the zero-point energies. The 207 cohesive energy (Ecohesive per molecule) of form I is predicted to be 0.560 kJ mol −1 lower than 208 form II prior to any correction for BSSE, but rises to 0.035 kJ mol −1 higher than form II after 209 BSSE correction. A noticeable difference is that the entropy of form I is predicted to be smaller 210 than that of form II. It should be noted that this refers to vibrational entropy, calculated for the 211 idealised form I and form II crystal structures. The calculations do not quantify configurational 212 entropy associated with the structural disorder described for form II single crystals. 213 214

Frequency analysis 218
The simulated vibrational spectra were calculated for the optimised crystal structures, and are 219 compared with the experimental results in Figure 4 (the experimental results of the high purity 220 form II are used onwards for further analysis). Figure 5 shows the simulated spectra of both 221 forms with the bandwidth set to 1 cm -1 in order to display each feature for a clearer comparison. 222 As shown, the peak at around 1.8 THz is predicted for both forms with comparable intensities, 223 hence the decision to normalise the experimental spectra relative to that feature in Figure 3. The 224 three circled areas correspond to the differences identified between the experimental spectra: (i) 225 the feature at 1.53THz is predicted only for form II, which explains why the peak is more

243
To quantify the comparison, the experimental spectra were fitted in Matlab with the spectral 244 features constrained to the frequencies predicted by the simulations, where the peaks were fitted 245 to Lorentzian functions and the baseline was fitted to a power law (Eqn 1). 246

81.: 247
where ! is the amplitude of the peak, ! is the peak width, " ! is the predicted frequency of the 248 peak, d is the relative shift between the simulated active modes, and & + was used to fit the 249 baseline. Application of the shift d is common practice and is mainly due to the difference in 250 temperature between the experimental (80 K) and simulated (0 K) results as well as additional 251 effects of anharmonicity. The fits are shown in Figure 6, and the resulting parameters are shown 252 in Table 2. The highlighted features in both samples are not predicted in the corresponding form, 253 but rather in the other polymorph. The fact that it is possible to fit the experimental spectra very 254 closely using this methodology, but that both spectra contain residual features indicative of the 255 other polymorphic form, substantiates the hypothesis that the two polymorphs are intergrown in 256 the samples analysed, and furthermore demonstrates that this feature can be detected by the THz 257 measurements. It is also notable that the baseline fitting parameters of form II in Table 2 are 258 larger than those required for form I, consistent with the expectation that the form II sample is 259 more disordered. This can also be reflected from the wider peak width in form II on average, as 260 peak broadening is a significant effect to the terahertz absorption spectra caused by disorder in 261 crystalline materials. 32    It is evident in Figure 6 that the first peak of form II at 1.0 THz is not fitted as well as the 274 other features. Therefore, further calculations were performed to examine the energies of the 275 structures with the atoms displaced in steps along the eigenvectors corresponding to the first 276 spectral features of each form. Figure 7 shows that the difference between the energy calculated 277 with the harmonic approximation and the real total energy of form II was much larger than that 278 of form I. This implies that the influence of anharmonicity is much larger for the first peak of 279 form II than for form I, which can account for the less effective fitting that is observed. In 280 addition, it was found that the methyl groups (-CH3) in form II were much more active than in 281 form I within the whole low-frequency range (0.

Free energies of the polymorphs 298
In Section 3.3 (Table 1), it was shown that the total lattice energy of form I is calculated to be 299 marginally lower than that of form II at absolute zero. To further investigate the relative stability 300 of the polymorphs at real temperatures, the Gibbs free energy curves of both forms were 301 calculated from 7K to 300K (shown in Figure S4 and the lower bound being limited by the 302 CRYSTAL17 software package 28 ). A turning point is predicted at 75K, where the Gibbs free 303 energy of form I becomes larger than that of form II, which infers that form II is more stable at 304 ambient temperature. The dynamic calculations show specifically that the stability of form II 305 arises as a result of its greater vibrational entropy. However, two caveats apply: (1) the free 306 energy calculation is based on the harmonic approximation, which would be expected to 307 influence the predicted value of the transition point; (2) the calculations refer to the idealised 308 form I and form II crystal structures, and do not consider structural disorder associated with the 309 intergrown nature of form I / form II single crystals. 310

Conclusions 311
This work applies THz-TDS and ab initio simulations to explore the two highly similar 312 polymorphs of aspirin. The results show that, even for these two nearly identical crystal 313 structures, THz-TDS can reveal spectral differences based on its sensitivity to inter-and intra-314 molecular forces. With the help of DFT simulations, slight differences observed for the aspirin 315 polymorphs and the spectra provide further solid evidence of the intergrown nature of the two 316 forms. A number of calculated physical properties are compared for the two forms and most of 317 them are found to be quite similar, reflecting the high degree of structural similarity. However, 318 form II shows a relatively larger vibrational entropy, which yields a lower free energy than form 319 I at temperatures above 75 K. Larger parameters for the baseline fitting of the THz-TDS spectra 320 also indicate a higher degree of structural disorder for form II, consistent with established 321 expectations for this polymorph. Further computational work on a supercell structure 322 incorporating defects would be very valuable to investigate the influence of structural 323 imperfections in the system and could also add insight into possible phase transitions between 324 the aspirin polymorphs. However, a highly efficient computational plan will be required as well 325 as a large amount of computational resource to pursue this work further. 326

Acknowledgement 327
Qi Li would like to thank the Chinese Scholarship Council (CSC) for supporting her Ph.D. study. 328 The computational work was supported via our membership of the UK's HEC Materials 329 Chemistry Consortium, which is funded by EPSRC (EP/L000202, EP/R029431, EP/T022213), 330 and this work used the ARCHER UK National Supercomputing Service. 331