Appraisal of dispersion damping functions for the effective fragment potential method

The effective fragment potential (EFP) is a polarizable force field whose physically-motivated functional form is parameterized in an automated way from ab initio calculations, and whose dispersion potential has been suggested as a correction for Hartree-Fock or density functional theory calculations. However, the parameter-free dispersion damping potentials that are currently used in EFP do not follow from a rigorous derivation and do not satisfy simple limits for the dispersion energy. We introduce several new damping expressions that correct these deficiencies, then evaluate their performance alongside existing damping functions using a new database of ionic liquid constituents. This data set, which we call IL195×8, consists of complete-basis coupled-cluster interaction energies for 195 ion pairs at each of 8 different intermolecular separations. Ultimately, we recommend a new parameter-free dispersion damping function as a replacement for the one that is currently used in EFP. GRAPHICAL ABSTRACT


Theory
The dispersion component of the EFP energy is expressed in terms of C 6 coefficients that represent the leadingorder dipole-dipole contribution to dispersion [3].The C 6 coefficients are obtained from the well-known expression involving frequency-dependent, isotropic atomic polarizabilities (α p ) evaluated at imaginary frequencies [41][42][43][44][45].That expression, which is usually attributed to Casimir and Polder [46], is given by [44,[47][48][49][50][51]] The total dispersion energy is In EFP, the centers p and q are not atoms but rather centroids of localized molecular orbital (LMOs), whose spherically-averaged polarizabilities α p are used in Equation (1).Like all other dispersion potentials of the −C 6 /R 6 form, Equation (2) exhibits a singularity as R → 0 and a damping function is required to avoid divergence at short range.
In this work, we focus on the 'parameter-free' version of EFP, sometimes called EFP2 to distinguish it from an earlier model that involved some fitting [4].The dispersion damping function that is used in this version of EFP is intended to be a modification of the Tang-Toennies damping function [52], The order n is taken to match R −n pq in the dispersion expression, meaning n = 6 for use with Equation (2).The exponential parameter b is often written in the form where a 1 and a 2 are additional parameters and R vdW pq is an effective van der Waals (vdW) contact distance between centers p and q.Because EFP uses spherical Gaussian LMOs, Slipchenko and Gordon (SG) introduced a Gaussian analogue of Equation (3) [53], namely As compared to the summation in Equation ( 3), this modified form contains only even powers of R pq but it terminates at the same order, R n pq .This complements the denominator in the dispersion expression to all even powers R −2k pq .Noting that the overlap integral for two spherical Gaussian functions with exponents ζ and center-tocenter distance R pq is SG eliminate the parameter ζ in Equation ( 5) by substituting the exponent that is obtained during localization of the canonical MOs.The resulting expression is written in terms of the overlaps S pq between LMOs, rather than the distance R pq per se: [53] This eliminates any fitting parameters beyond the Gaussian exponents that are taken from standard Gaussian basis sets.In Ref. [53], however, the summation in Equation ( 7) is truncated at order (n − 2)/2 for n = 6, resulting in This is the form that the functions takes in the libEFP library [54], and thus defines what we will call the 'original' version of the parameter-free EFP dispersion damping function.
In order to be consistent with the TT damping function, or even its Gaussian modification in Equation ( 5), the damping function in Equation ( 8) ought to contain one additional term.That way, the damping expression meets the dispersion potential with a commensurate power of R n pq .Adding the missing term to Equation (8) affords where the four terms in square brackets are intended to scale like R 0 pq , R 2 pq , R 4 pq , and R 6 pq , respectively, as in Equation ( 5) with n = 6.Instead, the expression in Equation (8) has been applied to the C 7 component of EFP dispersion [55], despite the absence of any term in the numerator with a distance dependence greater than R 4  pq .Furthermore, while the ansatz in Equation ( 6) does eliminate parameters from the damping function, it also makes the tacit assumption that all pairs of LMOs have the same Gaussian exponent, ζ .If centers p and q have different exponents α and β, then a more appropriate expression is Setting γ = αβ/(α + β) and using as the damping function in Equation ( 5), one obtains a slightly different form for the damping function: For n = 6 this is and we call this the 'revised' parameter-free dispersion damping function.
The original parameter-free damping function in Equation ( 8) was introduced in 2009 [53], but has been updated in more recent work to incorporate odd powers of R.This is intended to provide a more consistent expression once a C 7 term is introduced in the EFP dispersion energy.The updated damping function is [19,20] which we will call a 'generalized' EFP dispersion damping function.However, the erroneous assumption of a single Gaussian exponent, leading to Equation ( 7) in the original formulation, is repeated in this more recent expression.As such, we also introduce a 'revised general' damping function of the form where we have not made any assumptions about the nature of the Gaussian exponents.It is worth noting that neither Equation ( 14) nor Equation ( 15) is bounded on the interval [0, 1], which is undesirable from a rigorous standpoint.This has likely gone unnoticed in practice because divergences occur only very close to R pq = 0 and in most real systems the steric repulsion will preclude access to these problematic geometries.Finally, we introduce two novel damping functions to be investigated alongside those discussed above.First, we write an exact Tang-Toennies expression in terms of the overlap, which puts the generalized formulas presented above on a more rigorous foundation: Using the overlap expression in Equation ( 11), one may eliminate the parameter and thus express the damping function in Equation ( 16) in the equivalent form Although this expression is bounded on [0, 1], it is likely to overdamp the interaction as a consequence of the exponential (rather than Gaussian) dependence on distance.
In order to avoid overdamping, we will also test a softer damping function based on the Becke-Johnson (BJ) form [56][57][58][59]: Recall that in modifying Equation (3) for use with EFP, Gaussian exponents were taken as stand-ins for the Tang-Toennies damping parameter b in Equation ( 4).We apply the same substitution here, only using the general form of the Gaussian overlap exponents, γ = αβ/(α + β).Note that γ in Equation ( 17) has dimensions of (length) −2 .Manipulating that expression to substitute a γ -dependent factor in place of b n in the denominator of Equation ( 19) affords the expression This overlap-based BJ damping function retains the parameter-free philosophy of EFP but is trivially extensible to odd powers of R without resorting to augmentations that render the function unbound on the interval [0, 1].
Table 1 summarizes the damping functions introduced above.A successful damping function must prevent E disp from diverging as R → 0, regardless of the fine details of how it performs otherwise.This limit provides a useful first assessment of the validity of any damping function, and Table 1 reports the the limiting value lim R→0 (−f 6 /R 6 ) for each damping function.The original EFP damping function yields a finite, negative value for this limit.As such, while this function does successfully prevent divergence of the dispersion energy, it is not truly a Gaussian analogue of the TT damping function because the latter damps the dispersion energy to zero at R = 0 [52].Failure to satisfy this limit is not fatal to the use of damped EFP dispersion, but it does exemplify the errors associated with early truncation of the original Tang-Toennies expression.The revision suggested in Equation (12) restores the proper limit, namely, lim R→0 (−f 6 /R 6 ) = 0.
The generalized EFP dispersion damping expression [Equation (14)], introduced to accommodate C 7 dispersion [19,20], does not prevent a singularity as R → 0 and the revision introduced in Equation ( 15) does not rescue it from this fate.In order to turn the generalized expression into a formally valid damping function, we abandon the concept of Gaussian damping altogether and return to the exponential Tang-Toennies expression in Equation ( 16), as it restores the proper R → 0 limit.Finally, the parameter-free BJ damping function in Equation ( 20) exhibits a small but finite limit as R → 0.

Computational methods
For a numerical assessment of these various damping expression we turn to the HF plus dispersion method, which we denote as HF+D(EFP) to indicate that the dispersion correction is obtained from EFP. Various forms of the +D(EFP) correction are considered, using the same form for the dispersion energy [Equation ( 2)] but with various damping functions.
As an example of a 'real-world' application where the HF+D approach might be preferred to DFT+D, we consider a data set of ion pairs corresponding to common constituents of room-temperature ionic liquids.For such systems, Grimme et al. [60] report that DFT calculations sometimes face convergence issues related to delocalization error and unphysical charge transfer, at least when semilocal functionals are employed.The HF+D(EFP) method provides a potentially useful alternative in such cases, and indeed could be used as a general way to probe whether delocalization error is problematic in a particular case [61,62], without sacrificing the quantitatively important dispersion interaction in room-temperature ionic liquid (IL) systems [63][64][65][66], where dispersion plays a much larger role as compared to ionic solids and its short-range behavior is important [63,64].
The data set assembled for this work is a subset of the IL-2013 database [67], consisting of 195 unique ion pairs.We shift each dimer along the vector connecting the cation and anion centers of mass, in order to generate eight different geometries for each dimer corresponding to intermolecular separations ranging from 0.9R e to 2.0R e , where R e is the equilibrium separation from the IL-2013 database.We call this data set IL195×8, and it consists of 1,560 individual structures whose interaction energies we have computed at the level of coupledcluster theory with single, double, and perturbative triple excitations [CCSD(T)], using the domain-localized pair natural orbital (DLPNO) implementation of CCSD(T) [68,69] in the ORCA program [70].Tight cutoffs are used because the accuracy of the DLPNO approximation is known to be quite sensitive to these thresholds for noncovalent interaction energies [71,72].DLPNO-CCSD(T) interaction energies were extrapolated to the complete basis-set (CBS) limit using a two-point formula [73] with aug-cc-pVTZ and aug-cc-pVQZ.Additional details are provided in the Supplementary Material.
EFP parameters for each cation and each anion in the IL195×8 data set were computed at the HF/6-311++G(3df,2p) level, using the GAMESS program [74].This is the recommended basis set for parameterizing EFP [75].We also computed EFP parameters for the S66×8 data set [76], for which the monomers are smaller, and in that case we used the def2-TZVPPD basis set [77].(Both of these basis sets perform similarly for intermolecular interactions [78].)In all cases we use the same (rigid) monomer geometries and thus the same EFP parameters for each of the eight intermolecular separations.The EFP energy calculations discussed below were performed at the HF+D(EFP) level using a locally-modified version of the Q-Chem program [79], which includes an interface [80] to the libEFP library [54].

Results and discussion
Mean absolute percentage errors for the IL195×8 data set, comparing HF+D(EFP)/6-311++G(d,p) calculations to CCSD(T)/CBS benchmarks, are shown in Figure 1.With the notable exception of the Tang-Toennies damping function, each of the various damping procedures performs similarly with errors of about 2%.The Tang-Toennies function, however, damps E disp to such an extent that the dispersion-corrected method is no better than plain HF theory without any dispersion correction at all!In the absence of a dispersion correction, or when the Tang-Toennies damping function is used, HF theory underbinds the IL195×8 complexes by an average of almost 7 kcal/mol, as shown in Figure 2. In fact, this underbinding was the original motivation for replacing the exponential in the Tang-Toennies damping function with a Gaussian function [53].
Setting aside the case of the Tang-Toennies damping function, the overall accuracy of the various damping functions is similar when measured in percentage terms.However, maximum errors in Figure 1(b) highlight subtle nuances in the behavior of each damping potential as R → 0 and each of the other damping functions systematically overbinds these dimers by 1-2 kcal/mol; see Figure 2.This is consistent with results indicating that EFP consistently overestimates dispersion interactions in IL constituent dimers, relative to third-order SAPT calculations [81].The original and generalized damping equations exhibit slightly smaller errors than their revised counterparts, although this indicates some error cancellation since only the revisions have a rigorous theoretical foundation.The BJ damping function affords results comparable to those of the original EFP damping function but without ad hoc approximations such as missing terms.We recommend this version for general use with EFP dispersion.
To further investigate the origin of the exceptionally larger percentage errors, we decomposed the IL195×8 dataset into subsets consisting of the protic species (77 dimers) versus aprotic ones (118 dimers).In the IL-2013 database from which we draw our structures [67], the aprotic dimers are mostly imidazolium-based and might be expected to have rather different dispersion energies as compared to the protic dimers, which typically contain alkyl ammonium cations [66,[81][82][83].In our data set, dispersion accounts for a slightly larger percentage of the total interaction energy for the aprotic dimers as compared to the protic dimers, and error statistics for both subsets are presented in Figure 3.
The mean percentage errors are not significantly different between the protic and aprotic data sets but the maximum errors are quite different.Protic ILs exacerbate the underdamping problem identified in Figure 2 because hydrogen bonding reduces the average interatomic distance, and the ion-pair thus samples more of the the short-range behavior of the damping function, and maximum percentage errors are larger (as compared to the aprotic set) despite the dispersion energies themselves being slightly smaller.Erroneous short-range behavior might be alleviated by considering higher-order dispersion corrections, beyond C 6 .While C 7 dispersion has been developed for use with EFP [55], and C 8 coefficients have been estimated from the C 6 coefficients [84], these higher-order dispersion terms have not been incorporated into the Q-Chem/libEFP interface [80].As such, dispersion terms beyond C 6 are not considered in this work.
Finally, we have also assessed the performance of HF+D(EFP) on the S66×8 data set of dimers [76], where the monomers are all charge-neutral, in order to test whether the results above are somehow an artifact of the IL data set.(Dimers containing acetylene were excluded from these calculations because the Q-Chem/libEFP interface does not handle linear fragments at the present time.)Error statistics are shown in Figure 4 and suggest that the general behavior is quite similar to what we observed for IL constituent dimers.The 'no dispersion' method is conventional HF theory whereas other entries are HF+D(EFP) with various damping functions.In two cases, the maximum error exceeds 8.5 kcal/mol and is cut off in this graph.

Conclusions
We have examined various flavors of HF+D(EFP) using two sizable data sets of noncovalent dimers.Overall, we find that the damping functions currently used for EFP dispersion are unsatisfactory from a formal standpoint as the power series in the modified Tang-Toennies damping function is truncated too early.Furthermore, replacement of an empirical damping parameter with a spherical Gaussian overlap integral fails to account for asymmetric Gaussian exponents on different centers.These errors lead to damping functions that do not satisfy correct limiting conditions.In particular, a generalized damping function that was introduced to extend the original formulation to odd powers of R (in order to incorporate R −7 dispersion) is divergent as R → 0 and thus does not serve as a damping function at all when the intermolecular separation is small.Ultimately, we recommend the parameter-free, BJstyle dispersion damping function introduced in Equation ( 20) as a replacement for the modified Tang-Toennies function that is presently employed in EFP.For the data sets examined here, the HF+D(EFP) method in conjunction with new damping function exhibits an accuracy that rivals existing EFP damping functions, yet the new approach is based on a correct theoretical framework and is extensible to odd powers of R. Given the extensive use of EFP across myriad computational chemistry applications, we expect that following this recommendation will minimize errors in many future studies by putting EFP dispersion on a more rigorous footing.

Figure 1 .
Figure 1.Error statistics in HF+D(EFP)/6-311++G(d,p) interaction energies for the IL195×8 data set, considering: (a) equilibrium geometries only, versus (b) the entire data set, including intermolecular separations ranging from 0.9R e to 2.0R e for each ion pair.Orange bars are mean absolute percentage errors and gray bars are maximum percentage errors, as compared to CCSD(T)/CBS benchmarks.The 'no dispersion' method is conventional HF theory whereas other entries are HF+D(EFP) with various damping functions.

Figure 3 .
Figure 3. Error statistics in HF+D(EFP)/6-311++G(d,p) interaction energies for the entire IL195×8 data set compared with protic and aprotic subsets.Colored bars are mean absolute percentage errors and gray bars are maximum percentage errors, as compared to CCSD(T)/CBS benchmarks.The 'no dispersion' method is conventional HF theory whereas other entries are HF+D(EFP) with various damping functions.

Figure 4 .
Figure 4. Error statistics in HF+D(EFP)/def2-TZVPPD interaction energies for the S66×8 data set (with acetylene complexes removed) as compared to CCSD(T)/CBS benchmarks.Orange bars indicate the mean absolute errors and gray bars are maximum errors.The 'no dispersion' method is conventional HF theory whereas other entries are HF+D(EFP) with various damping functions.In two cases, the maximum error exceeds 8.5 kcal/mol and is cut off in this graph.

Table 1 .
Definitions of various damping functions for EFP dispersion.a