Systematic Evaluation of Counterpoise Correction in Density Functional Theory

: A widespread belief persists that the Boys − Bernardi function counterpoise (CP) procedure “overcorrects” supramolecular interaction energies for the effects of basis-set superposition error. To the extent that this is true for correlated wave function methods, it is usually an artifact of low-quality basis sets. The question has not been considered systematically in the context of density functional theory, however, where basis-set convergence is generally less problematic. We present a systematic assessment of the CP procedure for a representative set of functionals and basis sets, considering both benchmark data sets of small dimers and larger supramolecular complexes. The latter include layered composite polymers with ∼ 150 atoms and ligand − protein models with ∼ 300 atoms. Provided that CP correction is used, we find that intermolecular interaction energies of nearly complete-basis quality can be obtained using only double-ζ basis sets. This is less expensive as compared to triple-ζ basis sets without CP correction. CP-corrected interaction energies are less sensitive to the presence of diffuse basis functions as compared to uncorrected energies, which is important because diffuse functions are expensive and often numerically problematic for large systems. Our results upend the conventional wisdom that CP “overcorrects” for basis-set incompleteness. In small basis sets, CP correction is mandatory in order to demonstrate that the results do not rest on error cancellation.


INTRODUCTION 1.Density Functional Theory for Noncovalent
Interactions.The description of noncovalent interactions within density functional theory (DFT) has improved dramatically over the past two decades, relative to a starting point where generalized gradient approximations (GGAs) predict interaction potentials that are unbound for dispersiondominated complexes such as rare-gas dimers or even (C 6 H 6 ) 2 . 1 The first functionals to deal with this problem in a generally effective way were those with empirical atom−atom dispersion potentials added ad hoc. 1,2These dispersioncorrected or "DFT+D" methods remain tremendously important in contemporary DFT, 1−3 and modern versions of the dispersion corrections resemble proper atomic C 6 coefficients in the separated-atom limit. 4,5(Double counting of correlation effects at van der Waals contact distances does remain an issue, however. 1,6,7−15 Rather, because the semilocal meta-GGA framework has a much longer range in real space, due to its dependence on the density Laplacian and the kinetic energy density, meta-GGAs afford nonvanishing interactions in the region of nonbonded close contacts (unlike typical GGAs), and noncovalent interactions may therefore emerge upon sufficient parameterization.Separately, the development of nonlocal correlation functionals, 16−22 which have the right physics to describe dispersion from first principles, has paved the way for a new generation of empirical functionals 23−25 based on the B97 model. 26At present, these B97-based functionals are among the best performers for noncovalent interactions, with accuracy approaching 1−2 kcal/mol for standard benchmark data sets. 3Other methods attempt to extract in situ atomic C 6 (or C 8 , C 10 , ...) coefficients from a DFT density for a first-principles approach to modeling dispersion.−35 Finally, there is double-hybrid (DH-)DFT, 36,37 where a fraction of the second-order Møller−Plesset (MP2) correlation energy is introduced.−41 In DFT, noncovalent interaction energies ΔE int are typically computed via the supramolecular approach: −46 This reflects the fact that in the dimer calculation (E AB ), either monomer may use basis functions centered on the other to improve the quality of its wave function, but that possibility is precluded in the monomer calculations (E A and E B ).−50 The Boys−Bernardi "function counterpoise" (CP) procedure is a means to sidestep this problem, which dates to the early days of quantum chemistry. 51The CP procedure recognizes that eq 1 is an unbalanced approximation unless all three energies are computed in the same (dimer) basis set.−44 When small basis sets are used, |ΔE int | is much too large as compared to the complete basis set (CBS) limit computed at the same level of theory, but |ΔE int | may be too small upon CP correction.This observation is not an indictment of the CP procedure per se because BSSE is interwoven with basis-set incompleteness error (BSIE).Due to partial compensation of these effects, 55 it has been suggested that fixing one of these issues (namely BSSE, by means of the CP procedure) without fixing the other (BSIE, by use of larger basis sets) results in an approximation that is also unbalanced. 53The use of a "half-CP" procedure, 56 which averages the CP-corrected and uncorrected values of ΔE int , can be seen as a remedy for this imbalance.For correlated wave function methods, the half-CP average converges to the CBS limit faster than either the corrected or uncorrected interaction energy on its own. 49,50,55or DFT, there is an additional consideration in that the performance for noncovalent interactions varies greatly from one functional to another, and finite-basis error may offset inherent functional error in some cases. 57To the extent possible, one should therefore attempt to uncouple these errors by examining the performance of DFT in the CBS limit.Although the CP procedure has been carefully benchmarked for correlated wave function models, 50,55 its behavior in DFT calculations has not been examined so systematically.A recent comparison with explicitly correlated wave function results recommended CP correction for converging DFT to the CBS limit, 56 although the selection of basis sets and functionals was somewhat limited.Another assessment of dispersion-corrected GGA functionals using Dunning basis sets concluded that CPcorrected aug-cc-pVDZ results were close to quadruple-ζ results but that uncorrected aug-cc-pVDZ results were not. 58he systematic examination contained herein considers a wider variety of functionals and basis sets that are more typically used for DFT calculations.
With the exception of double-hybrid functionals, it can be expected that BSSE is smaller for DFT than it is for correlated wave function methods due to the more rapid basis-set convergence of DFT that originates in the absence of electron coalescence cusps.Dependence on virtual orbitals also increases the BSSE associated with correlated wave function methods. 52The much slower convergence of MP2 as compared to traditional semilocal or hybrid DFT underlies a recent recommendation to use the half-CP procedure with double-hybrid functionals versus "full CP" for other functionals. 56In the early days of molecular DFT calculations, small Pople-style basis sets were quickly judged to be adequate, 59 and this recommendation seems to have been ported to noncovalent problems without careful calibration.However, the conventional wisdom that double-ζ basis sets are adequate for DFT does not always hold for modern meta-GGAs and B97-based functionals, which converge more slowly than their predecessors with respect to both basis set 23,24,60 and quadrature grid. 24,61,62In view of this, a thorough assessment of CP correction in DFT seems timely.
1.2.History of CP Correction.We consider the historical record in an effort to understand the emergence of a conventional wisdom that the Boys−Bernardi procedure "overcorrects" for BSSE.2][53][54]63 Very early literature on the efficacy of CP correction is muddled by the use of low-quality basis sets, for which both BSSE and BSIE are sizable. Its difficult to take seriously the early criticisms of CP correction based on minimal-basis calculations, 64 where BSSE can be so large as to convert repulsive interactions into attractive ones.65 There was also an early suggestion 66 that the Boys−Bernardi procedure erroneously allows the occupied orbitals of one monomer to occupy regions of space that should be taken up by occupied orbitals of the partner monomer in the supramolecular complex, thus causing the CP procedure to underestimate the true interaction energy.This view was later shown to be false, however.67−69 In fact, spatial restriction of the occupied orbitals upon formation of the A•••B complex lies at the heart of the Pauli exclusion principle!44 The erratic nature of BSSE in small basis sets 70,71 partly motivated the development of intermolecular perturbation theory in the 1970s 72 because this approach is inherently free of BSSE.73 Gaussian basis sets had become more systematized by the 1980s, and a 1988 review 74 observes that it is "generally accepted" that the CP correction brings Hartree−Fock interaction energies closer to the CBS limit, except in the case of minimal basis sets.However, the effectiveness of the CP procedure was questioned in a 1985 study by Schwenke and Truhlar, 75 who performed calculations on (HF) 2 using 34 different Pople-style basis sets and found no systematic improvement when CP was used.That said, there was little variation among the basis sets tested, and a subsequent study of alanine dimer using many of the same basis sets reached the opposite conclusion, namely, that CP correction typically improves the results.76 Other work from the same time period suggested that the CP procedure actually insulates the results from otherwise erratic changes in ΔE int upon relatively small changes in the basis set.77 This understanding was summarized in a 1994 review of the "overcorrection debate".44 The Boys−Bernardi procedure is not only compatible with the Pauli principle, 67 but furthermore, CP-corrected supramolecular calculations afford results of comparable accuracy to intermolecular perturbation theories that are inherently free of BSSE.43,67−69 (In fact, supramolecular perturbation theory calculations may afford nonsensical results for energy components if these are not corrected for BSSE.43 ) Similarly, comparison of CP-corrected MP2 to the corresponding "chemical Hamiltonian" version of that method, 78 which is inherently free of BSSE, results in excellent agreement for basis sets of moderate quality.79,80 By late 1990s (but arguably much earlier than that), 42−44 it was therefore evident that CP correction generally improves the quality of supramolecular interaction energies. Whre exceptions can be found in the older literature, they are almost always attributable to the use of low-quality basis sets.Nevertheless, reviews by van Lenthe et al. in 1987 42 and by the same authors again in 1994 44 bemoan the persistent and widespread belief that the Boys−Bernardi procedure "overcorrects" as well as the notion that the CP correction is merely an "estimate" of the BSSE rather than a well-defined procedure that recognizes the origin of the problem (namely, imbalance in the supramolecular formula of eq 1) and eliminates it, essentially by definition.
To a significant extent, this older discussion was hindered by inability to reach the CBS limit for systems with more than a few atoms or to obtain reliable ab initio benchmarks.Those problems have been overcome, and correlated wave function benchmarks clearly indicate that CP-corrected and uncorrected energies converge to the CBS limit from opposite directions. 50e regard this as an incompleteness problem rather than an "overcorrection" because the CP-corrected results extrapolate more smoothly to the CBS limit, resulting in smaller error bars. 50(This behavior was originally predicted by Dunning. 81) Half-CP results converge even faster, 50,55 and other molecular properties also converge more smoothly with CP correction than without. 47Reported cases where uncorrected results converge faster to the CBS limit, 52 or where the basis-set error (relative to the CBS limit) is larger with CP correction, 54 typically involve systems where the difference between the CPcorrected and uncorrected results is quite small, whereas the systematic study in ref 50 considered a wide range of cases.As such, we conclude that there is nothing in wave function theory to suggest that CP correction is ill-advised.
1.3.Overview of the Present Work.Whereas the CP correction for correlated wave function methods has been considered systematically, 50,55 as has that of double-hybrid functionals, 56 the latter constitute a relatively small niche in the overall pantheon of DFT methods and the CP correction has not been considered systematically for other classes of functionals.In addition, we want to understand how basis sets of double-and triple-ζ quality behave, as these are the largest basis sets that are typically employed in DFT calculations.We will therefore examine the effects of CP correction for interaction energies computed using a set of density functionals that perform well for noncovalent interactions, using basis sets as small as 6-31G*.These tests span a range of system sizes because BSSE is size-extensive and large supramolecular complexes exhibit dramatically larger CP corrections as compared to small dimers.Finally, we will consider whether CP correction is more or less economical than simply enlarging the basis set, as a strategy to obtain converged DFT/CBS values of ΔE int .

METHODS
Basis sets tested here include Dunning's correlation-consistent sequence, aug-cc-pVXZ (with X = D, T, and Q); 82,83 Karlsruhe "def2" basis sets through def2-QZVPD; 84,85 and Pople basis sets 6-31G*, 6-31+G*, 6-311G*, and 6-311+G*. 86Pople basis sets use a common orbital exponent for s and p functions in a given shell, which makes them much more efficient than alternatives with a comparable number of functions, 87,88 provided that the electronic structure program takes advantage of this simplification.As such, Pople basis sets continue to see widespread use in DFT calculations, especially in large systems.A double-ζ basis set called DZVP, 89 which is not in widespread use, is also tested for large supramolecular complexes because it is reported to afford accurate interaction energies for small dimers, without the need for CP correction. 90(In the basis set exchange, 91 this basis set is called DGauss-DZVP.) We selected a small set of functionals that perform well for noncovalent interactions. 3These include BLYP+D3(BJ), a GGA functional that employs the D3 empirical dispersion correction 4 with a Becke−Johnson (BJ) damping function; 6 PBE0+D4, a hybrid GGA using the relatively new D4 dispersion correction; 5 ωB97X-V, 23 a hybrid GGA with nonlocal VV10 correlation; 21 and finally the hybrid meta-GGAs ωB97M-V 24 and M06-2X. 10Extensive benchmarking suggests that ωB97X-V, ωB97M-V, and BLYP+D3(BJ) are among the best all-around options for noncovalent interactions, 3 although M06-2X remains widely used in that capacity and PBE0+D4 performs very well for large supramolecular complexes. 39We also consider the dispersioncorrected M06-2X+D3(0) functional, where "D3(0)" indicates the original damping function developed for D3, 2,4 and finally the meta-GGA functional M06-L. 92The latter performs less well for noncovalent interactions but exhibits exceptionally poor convergence properties with respect to basis set, 60 making for an interesting test.A few double-hybrid functionals are also tested: PBE-QIDH, 93 B2GP-PLYP, 94 ωB97X-2(LP), 95 and ωB97M(2). 24For these functionals, the MP2 correlation energy is evaluated within the resolution-of-identity approximation using orbitals obtained from the underlying hybrid functional.
The overall performance of these methods is illustrated in Table 1 using several noncovalent data sets.Importantly, the DFT calculations used to compute these error statistics were performed at or near the CBS limit and are compared to benchmark values of ΔE int obtained at the level of coupledcluster theory with single, double, and perturbative triple excitations [CCSD(T)], also extrapolated to the CBS limit.As such, these error statistics probe the inherent accuracy of DFT itself, free from BSSE or BSIE.The data in Table 1  intended to demonstrate what level of accuracy is presently feasible with available exchange−correlation functionals.In the rest of this work, we will mostly be concerned with convergence to the CBS limit for a particular functional and therefore subsequent error statistics will be defined with respect to that limit (DFT/CBS), rather than comparing to CCSD(T)/CBS results.Often in DFT calculations, quadruple-ζ energies are taken to be equivalent to the CBS limit. 38,58,60Except for double-hybrid functionals, we will define the DFT/CBS limit using the average of CP-corrected and uncorrected values of ΔE int computed using a quadruple-ζ basis set, namely, aug-cc-pVQZ for small dimers and def2-ma-QZVP 88 for larger systems.For dimers in the S66 data set, 96 CP correction at the aug-cc-pVQZ level typically amounts to <0.1 kcal/mol, indicating convergence to the DFT/CBS limit.For larger systems, the magnitude of the CP correction in the def2-ma-QZVP basis set affords an estimate of the uncertainty in the DFT/CBS limit.For double-hybrid functionals, we estimate the CBS limit using the SCF/aug-cc-pVQZ hybrid DFT energy plus a two-point "T/Q" extrapolation of the MP2 correlation energy, where the latter is computed using aug-cc-pVTZ and aug-cc-pVQZ. 97ll calculations were performed using Q-Chem v. 5.4. 98ntegral screening and shell-pair drop tolerances were both set to τ ints = 10 −12 a.u., and the self-consistent field (SCF) convergence threshold was set to τ SCF = 10 −8 Ha, except for the protein−ligand systems in Section 3.3.2,for which we use τ ints = 10 −10 a.u. and τ SCF = 10 −6 Ha.The SG-2 quadrature grid is used for all B97-based functionals and SG-3 for the Minnesota functionals. 62For other functionals, we use SG-1. 99

RESULTS AND DISCUSSION
3.1.S66 Data Set.We will use the S66 data set 96 to obtain a systematic understanding of the effect of CP correction on ΔE int .The largest of the S66 dimers contains 34 atoms (pentane dimer), and this small size allows us to use quadruple-ζ calculations even for computationally demanding functionals and thus to definitively establish the CBS limit.

Convergence Errors for Conventional DFT.
A statistical summary of basis-set convergence errors for S66 is provided in Table 2, where the errors are defined with respect to the DFT/CBS limit for each functional.More detailed statistics can be found in Tables S1−S7, including a breakdown into subsets consisting of hydrogen-bonded dimers, dispersionbound dimers, and dimers of mixed-influence interactions.Data for M06-2X+D3(0) are omitted from Table 2 but can be found in Table S2.
It proves useful to examine Table 2 by class of basis set.For the Dunning basis sets, CP correction has a significant effect only at the double-ζ level.For aug-cc-pVDZ, the mean absolute error (MAE) with respect to the DFT/CBS limit is reduced from 0.7 to 0.2 kcal/mol (averaging across functionals) when the CP correction is included.(These statistics exclude the double-hybrid functionals, whose basis-set convergence is quite different and which are considered later.)With the exception of the M06-L functional, this difference of ≈0.5 kcal/mol between CP and non-CP interaction energies is larger than the difference between DFT/CBS and CCSD(T)/CBS interaction energies for the same data set (Table 1).When the basis set is extended to augcc-pVTZ, the CP correction is nearly negligible (≲0.1 kcal/ mol), except in the case of M06-L, and for aug-cc-pVQZ it is negligible in all cases.Notably, the CP-corrected aug-cc-pVDZ convergence errors are generally close to the uncorrected augcc-pVTZ values, such that results of triple-ζ quality can be obtained at double-ζ cost by applying CP correction.
Most practical DFT calculations do not employ correlationconsistent basis sets, so we next turn to the Karlsruhe basis sets that were designed specifically for SCF calculations.We tested versions with 85 and without 84 augmentation by diffuse functions, finding that diffuse functions afford systematically smaller convergence errors both before and after CP correction.Otherwise, the behavior of the Karlsruhe basis sets is similar to that of the Dunning basis sets, albeit with somewhat larger convergence errors, especially at the double-ζ level.Similar to the aug-cc-pVDZ case, however, def2-SVPD with CP correction is sufficient to reduce the convergence error so that it is smaller than the intrinsic accuracy of the functional itself.In the absence of the CP correction, this cannot be said of the double-ζ basis sets.A corollary is that once again, triple-ζ basis sets are not required to reach the Journal of Chemical Theory and Computation intrinsic accuracy limit, as CP-corrected def2-SVPD convergence errors are within ∼0.1 kcal/mol of def2-QZVPD values.
Pople basis sets remain workhorses in DFT, and it is therefore interesting to note that in order to reduce convergence errors below 1.0 kcal/mol, CP correction is required in most (although not all) cases.With the exception of the slowly convergent M06-L functional, convergence errors for all CP-corrected functionals are ≲0.5 kcal/mol.These errors are systematically smaller when diffuse functions are added, yet CP correction is still required to reduce the convergence error to something comparable to the intrinsic accuracy of the functional in the DFT/CBS limit.
So far, this discussion has been confined to error statistics with respect to the DFT/CBS limit, but it is important not to lose sight of accuracy with respect to benchmark interaction energies.Table S8 lists error statistics with respect to CCSD(T)/CBS benchmarks 96 for the functionals and basis sets in Table 2.With the exception of the notably less accurate M06-L functional, the errors fall within <0.6 kcal/mol of the benchmarks when CP correction is employed but exceed 2 kcal/mol in some double-ζ basis sets when the CP correction is omitted.With CP correction and diffuse functions (i.e., excluding the basis sets 6-31G* and def2-SVP), the double-ζ errors are ≲0.4 kcal/mol.CP correction is thus crucial not only for achieving the CBS limit but also for improving the accuracy of small-basis interaction energy calculations.

Convergence Errors for Double-Hybrid Functionals.
Convergence error statistics with respect to the CBS limit are summarized for double-hybrid functionals in Table 3. Inclusion of MP2 correlation means that we expect the convergence behavior of these functionals to resemble that of correlated wave function methods, which is generally much slower than DFT, and the convergence behavior of MP2 itself is also examined in Table 3.We observe that MP2 convergence errors in Pople basis sets and in def2-SVP are larger with CP correction than without, which can simply be ascribed to the fact that these basis sets are not appropriate for a correlated wave function method.A more incisive explanation is gleaned by examining the raw data for S66 interaction energies computed at the MP2 level (Table S9), which reveals that MP2 severely underestimates ΔE int for the dispersion-bound subset of S66 when these basis sets are employed.Since the absence of CP correction leads to a larger interaction energy, there is a partial error cancellation that is upset by CP correction.This behavior contrasts with the systematically smaller convergence errors obtained when CP correction is applied to GGA and hybrid functionals, and it sets up a competition between that behavior and the behavior of MP2 correlation in the case of double-hybrid functionals, when very small basis sets are employed.This leads to slightly less systematic convergence behavior than what was observed for GGA and hybrid functionals; CP correction generally reduces the convergence errors, but by amounts that are somewhat functional-dependent.
That said, if we exclude def2-SVP and the Pople basis sets, then convergence to the DH-DFT/CBS limit is reasonable and is facilitated by CP correction, especially in double-ζ basis sets such as def2-SVPD.A graphical example is presented in Figure 1 for Karlsruhe basis sets; see Figures S1−S3 for additional functionals and basis sets.Using CP correction in conjunction with def2-SVPD results in a mean convergence error of 0.2 kcal/mol, comparable to the CP-corrected def2-TZVPD error.While diffuse functions prove to be important (e.g., the def2-TZVP convergence error is a bit larger at 0.3 kcal/mol, even with CP correction), this analysis demonstrates that correlation-consistent basis sets are not required to reach the DH-DFT/CBS limit, which can facilitate significant cost savings.Convergence errors remain larger in Pople basis sets (see Figure S3) and are only somewhat mitigated (in most cases) by CP correction.These basis sets are therefore not recommended for double-hybrid functionals.
3.1.3.Error Distributions. Figure 2 plots how the basis-set convergence errors are distributed with respect to zero for one particular functional using a variety of basis sets and comparing both CP-corrected and uncorrected results.(For other functionals, including double hybrids, see Figures S4 and  S5.)In larger basis sets, the error distributions narrow considerably and def2-TZVPD is the smallest basis set for which all the data points lie within ±0.5 kcal/mol of the CBS limit, which is true in that case even without CP correction.For aug-cc-pVDZ and def2-SVPD, the CP-corrected errors are much more tightly grouped together as compared to the uncorrected results, with a mean signed error that tilts toward underbinding but by a much smaller amount as compared to the overbinding that is observed in the absence of CP correction.In the aug-cc-pVXZ data, one can perhaps find justification for the statement that "CP overcorrects for BSSE", yet it is equally true that the uncorrected results systematically underbind the S66 complexes, and as a result the half-CP average is more reliable than either the CP-corrected or uncorrected values of ΔE int .That said, CP correction reduces the MAE with respect to the CBS limit and also significantly reduces the spread of the convergence errors.Quantitative data for each functional and basis set, as measured by root-meansquare deviations from the DFT/CBS limit, can be found in Tables S15 and S16.
These trends are more muddled in the case of Pople basis sets, where CP-DFT does not consistently under-or overbind the complexes, and the distribution of CP-DFT convergence errors is centered close to zero.For example, in the case of the BLYP+D3(BJ) functional, the mean signed errors range from −0.07 to +0.10 kcal/mol for the four Pople basis sets shown in Figure 2c; see Table S4 for the quantitative data.If CP correction is neglected, the dimers are generally overbound, with the mean signed errors ranging up to −2.3 kcal/mol.Karlsruhe basis sets (Figure 2b) represent something of an intermediate case, in that CP-corrected results are not always underbound but convergence to the CBS limit is systematic.
Other functionals behave similarly to BLYP+D3(BJ), except for the problematic M06-L (Figure S4), for which CPcorrected convergence errors show a significant spread on both sides of zero and the distributions of CP-corrected and uncorrected values are difficult to distinguish.As such, CP correction may either improve or degrade the results but does so more or less at random, indicative of an overall failure of systematic convergence to the CBS limit. 60For the doublehybrid functionals (Figure S5), Dunning and Karlsruhe basis sets behave similarly to what is seen for BLYP+D3(BJ) in Figure 2b,c, but Pople basis sets clearly underbind the complexes when CP correction is applied, whereas the results without CP correction are overbound.
In general, it can be said that the CP-corrected convergence errors span a smaller range as compared to convergence errors when CP correction is omitted.This discrepancy is especially prevalent in double-ζ basis sets and in all Pople basis sets.These observations parallel what has been observed for wave function methods, 50 although the magnitude of the CP correction is different, reflecting the different convergence behavior of DFT.The same trends hold for double-hybrid functionals if Pople basis sets (for which CP correction has only a modest effect) are excluded.
3.1.4.Timing Data.We have observed that CP-corrected double-ζ calculations often afford interaction energies on par with uncorrected triple-ζ results, so it is interesting to consider which of these methods is less expensive.We assess this using pentane dimer as it is the largest of the S66 complexes, and Table S17 presents timing data using each of the functionals and basis sets whose convergence statistics are given in Table 2.In any given basis set, CP correction adds about a factor of 2.5× to the cost of the calculation, which reflects the cost of performing two monomer calculations in the dimer rather than the monomer basis set.This additional overhead pales in comparison to the increased cost of extending the basis set from double-to triple-ζ, however.All three double-ζ

Journal of Chemical Theory and Computation
calculations in the dimer basis set can typically be performed in 25−50% of the time required for a single triple-ζ calculation of the dimer.If a triple-ζ level of convergence is desired, then CP correction is about 4× less expensive than a single quadruple-ζ calculation on the dimer.
An illustration of the trade-off between accuracy and cost is presented in Figure 3 using convergence errors for the whole of S66, with BLYP+D3(BJ) and Karlsruhe basis sets.(For other functionals and basis sets, see Figures S6−S8.)Timing data are shown for pentane dimer.In the absence of CP correction, double-ζ basis sets afford unacceptably large convergence errors (1.5−2.5 kcal/mol on average), which the CP correction reduces below 0.25 kcal/mol, smaller than the intrinsic accuracy of the functional for this data set.All three calculations required for the CP-corrected double-ζ result can be obtained in about 75% of the compute time required for a single triple-ζ calculation, yet basis-set convergence errors and accuracy with respect to CCSD(T)/CBS are comparable to those obtained using def2-TZVPD.This comparison demonstrates that a CP-corrected double-ζ calculation is a cost-effective strategy to obtain results of triple-ζ quality.

(Coronene) 2 at Displaced
Geometries.The S66 calculations reported above were performed exclusively at equilibrium geometries.Since BSSE arises from overlapping monomer-centered basis functions, it should be sensitive to monomer separation, so we next examine results for coronene dimer, (C 24 H 12 ) 2 , as a function of intermolecular separation.We placed the dimer in an eclipsed cofacial ("sandwich") orientation at intermolecular separations ranging from R = 3 to 10 Å. Potential energy curves, both with and without CP correction, are shown in Figure 4 for the ωB97M-V functional using Karlsruhe basis sets.Results for other functionals and basis sets can be found in Figures S9 and S10 but are quite similar to the data in Figure 4.
In the absence of CP correction (Figure 4a), the minimumenergy separation of the dimer changes as a function of basis set, increasing from 3.50 Å in double-ζ basis sets to 3.75 Å in larger basis sets.The potential curve is also much shallower in the larger basis sets, consistent with too-large values of ΔE int due to BSSE, and the effect is larger than it was for the smaller S66 dimers.The def2-SVP and def2-SVPD basis sets overestimate ΔE int by 7.0 and 5.0 kcal/mol, respectively, as compared to the def2-QZVPD value.On the other hand, triple-ζ results are essentially converged to the CBS limit.
Upon CP correction (Figure 4b), all Karlsruhe basis sets afford very similar potential energy curves.There is no longer any meaningful distinction between def2-SVP and def2-SVPD results, both of which overestimate ΔE int by about 2.0 kcal/mol relative to triple-and quadruple-ζ results that are indistinguishable from one another.Moreover, the shapes of the CP-DFT/double-ζ potentials are the same as the converged results, with the same minimum-energy separation of R = 3.75 Å.In the absence of CP correction, we conclude that double-ζ basis sets cannot be assumed to afford correct geometries.
A new CCSD(T)/CBS interaction energy has recently been reported for coronene dimer, 100 and in Table 4 we use that benchmark to assess errors in ωB97M-V interaction energies using Karlsruhe basis sets.Although this functional affords subkcal/mol errors for the S66 dimers, that level of accuracy is not realized here, where the errors remain 2−3 kcal/mol upon CP correction.However, in the absence of CP correction, the relative errors (as a percentage of the total interaction energy) are actually larger for the S66 data set, where they range from 3% (def2-QZVPD) to 38% (def2-SVP).For CP-corrected calculations, errors in the S66 data set range from 3% (def2-QZVPD) to 9% (def2-SVP).Combined with the (coronene) 2 results, this suggests an intrinsic accuracy of ≈10% for ωB97M-V, but that intrinsic limit cannot be reached in double-ζ basis sets unless CP correction is used.A peculiarity in the coronene dimer results is that the def2-SVPD error is significantly larger (at 15 kcal/mol) than the def2-SVP error (8 kcal/mol).While CP correction reduces both errors to 3 kcal/mol, the larger error when diffuse functions are added defies the conventional wisdom that a good description of the tails of the density is important to the description of noncovalent interactions.In fact, larger errors for def2-SVPD than for def2-SVP are a feature of the dispersion-bound subset of S66 and also for every functional that we have examined (see Tables S1−S7), although the overall S66 convergence errors are reduced in the presence of diffuse functions.(This is primarily due to a reduction in convergence errors for the hydrogen-bonded subset of S66, and we have observed that diffuse functions are particularly important for the description of hydrogen bonds. 88) For the dispersion-bound complexes in S66, results with def2-SVP lead to overbinding for all functionals examined, and since CP correction generally reduces interaction energies, these results benefit from some error cancellation that disappears when diffuse functions are added.The effect is less overbinding with def2-SVPD.However, this type of error cancellation feels fragile and from our point of view, CP-corrected results obtained using def2-SVPD provide a more reliable representation of the true performance of double-ζ basis sets.

Large Complexes.
For the S66 dimes, CP-DFT in small basis sets can be used to obtain interaction energies that are within ∼0.2 kcal/mol of the DFT/CBS limit for the same functional, a convergence error that is smaller than the intrinsic accuracy of the best contemporary functionals for these same complexes.For coronene dimer, however, the CP correction is larger in double-ζ basis sets.In this section, we examine much larger molecular systems in order to understand how the magnitude of BSSE grows with system size.Systems examined include a set of layered complexes of oligothiophenes and graphene nanoribbons, 101,102 such as the example shown in Figure 5a.These range in size from 134 to 174 atoms and have ΔE int in the range of 60−100 kcal/mol, depending on the functional and basis set.We also examine four protein−ligand systems ranging from 261 to 323 atoms, whose DFT/CBS interaction energies range from 15 to 100 kcal/mol; two of these are shown in Figure 5b,c.As in the calculations reported above, our primary goal is not to evaluate ΔE int against some benchmark value but rather to examine convergence to the DFT/CBS limit and to assess the efficacy of CP correction in obtaining that limit.Geometries are taken from ref 102 and are provided in the Supporting Information.As representative functionals, we consider M06-2X and ωB97M-V using a subset of the basis sets examined above, mostly representative of practical choices for large systems but including the high-quality def2-TZVPD basis set.For the S66 dimers, the latter basis affords absolute convergence errors smaller than 0.15 kcal/mol for the two functionals considered here, and CP-DFT/def2-TZVPD affords interaction energies comparable to DFT/def2-QZVPD for the S66 dimers.
Interaction energies for these layered materials are listed in Table 5.The trends are similar for M06-2X and ωB97M-V, so our discussion will focus on the latter.When that functional is used with a double-ζ basis set, the difference between CPcorrected and uncorrected interaction energies is large, ranging from 12 to 48 kcal/mol.The DZVP basis set, which was previously found to afford accurate interaction energies in small-molecule data sets such as S66, 90 here affords CP corrections that are comparable to those obtained using 6-31+G* and certainly not small.
It should be noted that double-ζ basis sets are not recommended for use with ωB97M-V and similar, combinatorially optimized B97-based functionals. 23,24The def2-SVP basis is specifically suggested to be "incompatible" with ωB97M-V, 24 and def2-TZVPPD is suggested as the smallest recommended basis set.These recommendations are based in part on thermochemical data and may be overly conservative for noncovalent interactions; thus, it is relevant to consider smaller double-ζ basis sets here.The M06-2X functional, which is routinely used with small Pople basis sets, actually converges to the DFT/CBS limit in a manner that is very similar to ωB97M-V, as measured by the difference between CP-corrected and uncorrected values of ΔE int .
Despite the recommendation against double-ζ basis sets, with CP correction the ωB97M-V interaction energies are remarkably close to def2-TZVPD results, within 3 kcal/mol in all cases except 6-31G*.In fact, M06-2X exhibits slightly larger deviations between CP-corrected double-and triple-ζ results.It is perhaps surprising to note that diffuse functions have only a small effect when CP correction is used: ≲4 kcal/mol difference between def2-SVP and def2-SVPD results, and <0.5  kcal/mol difference between def2-TZVP and def2-TZVPD.Presumably, the use of a supramolecular basis set facilitates a good description of the tails of the density, whereas in the absence of CP correction diffuse functions are needed to describe these tails in the monomer calculations.This observation is good news for calculations on systems of this size, where diffuse functions add significantly to the cost and also incur problems with linear dependencies.That said, when double-ζ basis sets are employed, CP correction is essential in order to mitigate the effect of the diffuse functions; without it, def2-SVP and def2-SVPD interaction energies differ by >20 kcal/mol in some cases.For triple-ζ basis sets, the effect of diffuse functions is ≲1 kcal/mol even without CP correction.
Regarding the results with the smallest basis sets, such as 6-31G* (for which the CP-corrected and uncorrected values of ΔE int differ by as much as 30 kcal/mol), we note that the ratio |ΔE int (no CP)/ΔE int (CP)| ≈ 1.4.This is only slightly smaller than the ratio of ≈1.5 that is observed for the S66 dimers, using the same functional and basis set.For ωB97M-V/def2-SVP, this ratio is ≈1.3 for the graphene−oligothiophene systems.The similarity of these ratios suggests that neither the behavior of the functional nor that of the CP correction is fundamentally changed by moving to these larger systems.Instead, the larger convergence errors observed for the layered complexes reflects the size extensivity of BSSE, which is often overlooked.
When ωB97M-V is used with a larger def2-TZVPD basis set, the magnitude of the CP correction is reduced below 4 kcal/ mol or about 4% of |ΔE int | for these complexes.The ratio |ΔE int (no CP)/ΔE int (CP)| is reduced to ≈1.04, which is nearly the same as the ratio of ≈1.03 that is obtained for S66 calculations in this basis set.
3.3.2.Protein−Ligand Complexes.Finally, we consider four protein−ligand complexes: 181L:benzene (Figure 5b), 103 1LI2:phenol, 104 1HSG:indinavir (Figure 5c), 105 and 1O44:(C 34 H 35 N 3 O 9 ). 106In the latter complex, the ligand is a malonic acid derivative known as RU85052, and we henceforth refer to this system as 1O44:RU85052.The indinavir and RU85052 complexes involve much larger ligands as compared to benzene and phenol, and the computed values of ΔE int are correspondingly larger for these two systems.
−110 This structure was taken from ref 108.For the other three complexes, crystal structures were obtained from the protein data bank, 111 which were then protonated using the H++ web server. 112The resulting structures were relaxed using the GFN2-xTB semi-empirical quantum chemistry method, 113 using a generalized Born model for water. 114Relaxed structures were trimmed using a 5 Å cutoff (with respect to the ligand) for the 181L and 1LI2 complexes and a 2.5 Å cutoff in the case of 1O44, where the ligand is much larger.The coordinates that we used in the quantum chemistry calculations can be found in the Supporting Information.
We will again test M06-2X and ωB97M-V as representative functionals, using a variety of basis sets, and the corresponding interaction energies are listed in Table 6.The basis sets are standard except for minimally augmented ("ma") and heavyaugmented ("ha") versions of the Karlsruhe basis sets, which were introduced in ref 88.Due to the size of these complexes, we use the average of CP-corrected and uncorrected DFT/ def2-ma-QZVP interaction energies to estimate the DFT/CBS limit for the ligand−protein complexes.By examining the entries for def2-ma-QZVP in Table 6, one can see that the two values that are averaged lie within 0.8 kcal/mol of one another when ωB97M-V is used and within 1.2 kcal/mol for M06-2X.This provides some measure of the uncertainty in establishing the DFT/CBS limit.
Broadly speaking, these systems exhibit the same trends observed above, namely, that CP corrections are sizable in double-ζ basis sets but modest in triple-ζ bases.For the more weakly interacting complexes where the ligand is benzene or phenol, CP correction moves ΔE int about 2 kcal/mol closer to the CBS limit for M06-2X/6-31+G* and about 3 kcal/mol closer for ωB97M-V/6-31+G*.For ωB97M-V/def2-SVP, the CP correction reduces the convergence error by 1−2 kcal/mol, although for M06-2X/def2-SVP, the CP correction actually increases the convergence error by 1−3 kcal/mol.CP-ωB97M-V/6-31+G* results for 181L:benzene and 1LI2:phenol are essentially converged to the CBS limit, and the CP-M06-2X/6-31+G* results are only about 1 kcal/mol from that limit.In these cases, the CP correction improves the agreement with the DFT/CBS result by 3−4 kcal/mol or 20−25% of the total interaction energy.For the more strongly bound complexes

Journal of Chemical Theory and Computation
(1HSG:indinavir and 1O44:RU85052), CP-corrected 6-31+G* results are even more impressive, affording results that are only about 1 kcal/mol from the DFT/CBS limit.Nevertheless, CP correction is essential and exceeds 10 kcal/ mol.This is true even for the DZVP basis set that afforded good accuracy (without CP correction) for small-molecule calculations reported in ref 90, including the S66 data set.
Those results likely benefit from fortuitous error cancelation made possible by the relatively small magnitude of the CP corrections for the S66 dimers.
Notably, the addition of diffuse functions to def2-SVP significantly increases the convergence errors for both functionals.This behavior was also observed for coronene dimer (Section 3.2), where it was explained by analogy to the basisset convergence properties of the dispersion-bound subset of the S66 complexes.Consistent with those examples, here we find that CP correction significantly insulates the results

Journal of Chemical Theory and Computation
against this effect.With CP correction, the def2-ha-SVP basis set (in which hydrogen atoms do not have diffuse functions) performs very similar to fully augmented def2-SVPD, although the results in the minimally augmented def2-ma-SVP basis set are noticeably different.This suggests def2-ha-SVP (with CP correction) as a low-cost choice for large systems, in which convergence errors have been reduced to ≲1 kcal/mol with respect to the DFT/CBS limit.This basis is defined for most of the periodic table, whereas 6-31+G* is only defined for elements up to argon.
A visual perspective on convergence to the ωB97M-V/CBS limit for these systems is presented in Figure 6 for two of the ligand−protein complexes.(The corresponding plots for the other two complexes can be found in Figure S11 and for M06-2X in Figure S12.)In small basis sets, the CP-corrected value of ΔE int is much closer to the CBS limit as compared to the uncorrected interaction energy, which is systematically overbound.CP correction reduces the interaction energy in a systematic way as the CBS limit is approached, and although its effect is diminished in larger basis sets, the CP-corrected interaction energies hew more closely to the DFT/CBS limit than do the uncorrected values, with the former being close to convergence already for def2-SVPD.This parallels what we have seen for other systems and suggests that double-ζ interaction energies are completely unreliable for large systems unless CP correction is employed.Unlike the results in smaller systems, however, the triple-ζ results cannot be considered to be converged in these examples (even in the presence of diffuse functions) unless CP correction is applied, although the fully augmented def2-TZVPD basis set comes close.Timing data for the 1HSG:indinavir complex (Figure S13) demonstrate that CP-ωB97M-V/def2-SVPD is 6× less expensive than the def2-ma-QZVP basis set that is otherwise needed to reach the CBS limit.
Finally, in an effort to gauge the absolute accuracy of these ligand−protein interaction energies, we take advantage of a recent CCSD(T)/CBS benchmark for the 1HSG:indinavir system. 110Table 7 shows how ωB97M-V performs against this benchmark, using various basis sets.CP-corrected values of ΔE int lie within 10 kcal/mol of the benchmark (representing 8% error), even for basis sets as small as def2-SVP.This is certainly a much larger absolute error as compared to those observed for S66 dimers, but the percentage errors are typically 7−8% when CP correction is used.(The ωB97M-V/def2-SVP value is slightly more accurate.)Errors increase substantially in the absence of CP correction, up to 54 kcal/mol (45% error) in the smallest basis set tested.

CONCLUSIONS
Far from being an "overcorrection", the Boys−Bernardi CP procedure is an essential aspect of noncovalent quantum chemistry that significantly reduces the finite-basis error in intermolecular interaction energies and thus showcases the true behavior of the functional in question, free of error cancellation.CP-corrected interaction energies computed using double-ζ basis sets are comparable in quality to uncorrected triple-ζ results and therefore close to the DFT/ CBS limit, even for combinatorially optimized functionals such as ωB97X-V and ωB97M-V that are considered to be more demanding in their basis-set dependence. 23,24Perhaps surprisingly, these conclusions also hold for double-hybrid functionals that contain MP2 correlation and therefore exhibit Figure 6.Interaction energies for (a) 1LI2:phenol and (b) 1O44:RU85052 computed using ωB97M-V in various basis sets.Connected points illustrate convergence of ΔE int in Karlsruhe basis sets, both with and without CP correction.For comparison, values of ΔE int in the 6-31+G* basis set are also shown.The ωB97M-V/CBS limit, obtained by averaging def2-ma-QZVP interaction energies with and without CP correction, is indicated by the horizontal line, and its numerical value is also indicated.slower basis-set convergence (and also increased cost).Even for lower-cost GGA and hybrid functionals, the use of CP correction with double-ζ basis sets results in computational savings, as this approach is 3−4× faster than computing CPcorrected triple-ζ interaction energies and somewhat faster (1.2−1.6×)than computing triple-ζ interaction energies with no CP correction.For double-ζ basis sets, the use of a supramolecular basis (i.e., CP correction) significantly reduces the importance of diffuse basis functions.For example, we find that CP-corrected def2-SVP and def2-SVPD values of ΔE int are similar except in the case of double-hybrid functionals, whereas in the absence of CP correction the diffuse functions significantly alter ΔE int .For the coronene dimer, we find that double-ζ basis sets predict an incorrect intermolecular separation unless CP correction is applied, whereas the CP-corrected double-ζ geometry agrees with the results obtained using quadruple-ζ basis sets.These are important observations, given that diffuse functions are expensive and numerically problematic for large systems.
In systems with hundreds of atoms, including layered nanocomposite materials and protein−ligand complexes, the magnitude of the CP correction is much larger than it is in small benchmark dimers, which reflects the size extensivity of BSSE.In the absence of CP correction, double-ζ results for systems of this size may be 30−40 kcal/mol (or more) from the DFT/CBS limit and should be considered unreliable, which is notable given the prevalence of such calculations in the literature.On the other hand, CP-corrected interaction energies converge toward the CBS limit in a systematic manner, and they do so much more rapidly than the uncorrected results.CP correction again proves to be costeffective despite the added expense of computing the monomer energies in a supramolecular basis set.We recommend CP-DFT/def2-ha-SVP as a robust and relatively low-cost approach to compute ΔE int in large systems.
We find no realistic use cases where CP correction is disadvantageous.(CP correction can increase the convergence error when double-hybrid functionals are used with small Pople basis sets or with def2-SVP, but this reflects the inadequacy of those basis sets for MP2 correlation rather than a limitation of the CP procedure.)CP correction increases the cost of the calculation by a modest factor, typically ≈2.5×, even for large protein−ligand complexes with hundreds of atoms, and is therefore feasible whenever the supramolecular calculation itself is feasible.CP correction can be used with double-ζ basis sets to obtain reliable results in large systems where triple-ζ calculations are intractable.We suggest that its use should be considered mandatory for supramolecular calculations of ΔE int in large systems, especially when only double-ζ calculations are computationally feasible, in order to ensure that the results do not benefit from error cancellation that may not be robust.Even with triple-ζ basis sets, CP correction can compensate for the absence of diffuse functions and furthermore provides a means to estimate how far the finite-basis value of ΔE int may be from the DFT/CBS limit.

Figure 3 .
Figure3.Mean absolute convergence errors for BLYP+D3(BJ) calculations of the S66 complexes (bar graph, axis at left) vs wall time required for pentane dimer (orange dots, axis at right).Calculations were performed on a single compute node with 14 processors.

Figure 4 .
Figure 4. Potential energy profiles for the eclipsed-cofacial configuration of (coronene) 2 , computed using ωB97M-V in various basis sets (a) without CP correction and (b) with CP correction.

Figure 5 .
Figure 5. Examples of large molecular assemblies considered in this work: (a) (5PT)(C 38 H 22 )(5PT), (b) 181L:benzene, and (c) 1HSG:indinavir.In (b,c), the ligand is shown in a ball-and-stick representation and the protein binding site is shown using a tubular representation.

Table 2 .
Basis-Set Convergence Errors for S66 Interaction Energies mean absolute error (kcal/mol), vs DFT/CBS a aDFT/CBS is defined as the average of CP-corrected and uncorrected aug-cc-pVQZ interaction energies.

Table 3 .
Basis-Set Convergence Errors for S66 Interaction Energies Figure1.Mean absolute convergence errors (with respect to the CBS limit) for S66 interaction energies computed using the double-hybrid functional ωB97M(2) in various Karlsruhe basis sets.

Table 4 .
Accuracy of ωB97M-V Interaction Energies for Coronene Dimer error vs CCSD(T)/CBS a a Benchmark from ref 100.

Table 5 .
Interaction Energies (in kcal/mol) for Graphene−Oligothiophene Complexes ΔE int without CP correction.b ΔE int with CP correction.c Difference between CP-corrected and uncorrected values of ΔE int . a

Table 6 .
Interaction Energies (in kcal/mol) for Ligand−Protein Complexes a Errors relative to the DFT/CBS limit are given in parentheses.b ΔE int without CP correction.c ΔE int with CP correction. a

Table 7 .
Errors in ΔE int for 1HSG:Indinavir Using ωB97M-V a Versus the benchmark value ΔE int = −121.50kcal/mol from ref 110.b Error without CP correction.c Error with CP correction.