Optimization of a Range-Separated UW12 Hybrid Functional

In a previous paper we presented a new hybrid functional B-LYP-osUW12-D3(BJ) containing the Unsöld-w12 (UW12) hybrid correlation model. In this paper we present a new 15-parameter range-separated hybrid density functional using a power series expansion together with UW12 correlation. This functional is optimized using the survival of the fittest strategy developed for the ωB97X-V functional, fitted to data from the Main Group Chemistry Database (MGCDB84). In addition we optimize a standard hybrid and double hybrid using the same method. We show that our fully self-consistent UW12 hybrid functional WM21-D3(BJ) outperforms both of these functionals and other range-separated hybrid functionals.


Introduction
Density-functional theory (DFT) has in recent years become the most widely used electronic structure method. 1,2 Much of this success derives from the ability of DFT to achieve reasonable accuracy at a relatively low cost.
While exact in theory, Kohn-Sham density-functional theory contains an unknown energy component -the exchange-correlation (XC) functional -which must be approximated. Thus the accuracy of practical DFT calculations depends on the choice of exchange-correlation functional. [3][4][5][6] There is no systematic method to improve the exchange-correlation functional, but strategies have evolved to improve accuracy through including additional terms in the exchangecorrelation functional. The simplest density-functional is the local-density approximation (LDA) in which the energy density depends only on the electron density at each point. Accuracy may be improved by including terms that depend on the gradient of the electron density, forming the generalized-gradient approximation (GGA). There are several widelyused functionals of this type, including BLYP and PBE. [7][8][9] Further improvements in accuracy can be attained by including higher order density gradients and the kinetic-energy density. 10 However there appears to be a limit to the accuracy achievable through such approximations. Problems with these semi-local functionals can be related to the self-interaction error (SIE), arising from the incomplete cancellation between the self-interacting portions of Coulomb and exchange energies. 11 In semi-local functionals, the potential felt by a removed electron from a neutral molecule exhibits exponential decay in the long range, rather the the physically correct −r −1 decay behavior. 12,13 Hybrid density-functional methods attempt to reduce these errors by replacing a fraction of approximate density-functional exchange with exact Hartree-Fock exchange. 14,15 Hybrid functionals remain some the most popular density functionals in use today, though they are not free from problems. Most popular hybrid functionals such as B3LYP and PBE0 contain a relatively small fraction of exact exchange, 16,17 so they still contain a large amount of selfinteraction error. Also, while Hartree-Fock exchange has the correct long-range behavior, global hybrid functionals with a fixed fraction c x do not, since the long-range exchange potential behaves as −c x r −1 instead of the correct −r −1 .
One solution to this problem is through range-separation, where the Coulomb operator is split into short-and long-range components, such that the amount of exact exchange varies as a function of inter-electronic distance. 18,19 Such functionals are called range-separated hybrids (RSHs), and can be constructed to ensure the correct long-range behavior. In addition, these functionals exhibit reduced self-interaction error according to typical measures. 11 It is also possible to further increase the accuracy by including some amount of non-local correlation by introducing a dependence on the virtual (unoccupied) orbitals. 20,21 Double hybrid functionals achieve this by including a fraction of MP2 correlation; they have received much interest in recent years. [22][23][24] The UW12 approximation is a non-local correlation approximation given by 25 for occupied spin orbitals |i , |j , virtual spin orbitals projector |a , |b , and Coulomb operator r −1 12 , and two-electron geminal function w 12 . This approximation may be formulated in a way which includes no virtual orbital dependence. 25 Global hybrid functionals which include UW12 correlation such as B-LYP-osUW12 have been shown to give accurate results without a virtual orbital dependence. 26 In this work we wish to create a range-separated UW12 hybrid functional with correct long-range behavior using the ωB97 exchange-correlation functional as a starting point. Of particular interest are the ωB97X-V, ωB97M-V and ωB97M(2) functionals optimized using the survival of the fittest strategy. [27][28][29] These range-separated hybrid functionals are among the most accurate functionals currently available.
The range-separated GGA hybrid ωB97X-V is a 10 parameter functional containing range-corrected B97 exchange, B97 correlation, and the non-local VV10 functional, 30-32 with the parameters chosen to minimize the error in both the training and validation datasets considered. This method was then extended to produce the range-separated meta-GGA hybrid ωB97M-V, utilizing the large MGCDB84 database for optimization, and including meta-GGA components in the optimization, 5 as well as an MP2 energy correction in ωB97M(2). 33 Following this approach we set out to create a range-separated hybrid functional of the form ωB97X-osUW12 using the ωB97 functional together with opposite-spin UW12 correlation and a dispersion correction. We do not include any meta-GGA contribution in this work, but note that this could be included in the future.

Theory
The functional we aim to create contains multiple components and we begin with a brief discussion of each of them.
The Hartree-Fock exchange energy for a molecular system is given by: for r 12 = | r 1 − r 2 | and occupied spin orbitals |i σ , |j σ . In the local spin density approximation (LSDA), the exchange functional is given by where is the exchange energy per unit volume.
In the range-separation scheme, the Coulomb operator is split into short-and long-range parts using the error function such that 18 for range-separation parameter ω. The long-range exchange interaction is then given by where the kernel erf(ωr 12 )r −1 12 tends to r −1 12 as r 12 → ∞. By replacing the r −1 12 operator with erfc(ωr 12 )r −1 12 in the derivation of the LSDA exchange energy, the short range LSDA functional may be written as for where a σ = ω/k F,σ , and k F,σ = [6π 2 ρ σ ] 1/3 is the spin-polarized Fermi wave vector. 18 The original range-separation scheme consists of short-range density-functional exchange and long-range HF exchange, with no HF exchange at short range. It has been shown that including a small fraction c sr x of short-range HF exchange in addition to the short-range DFT exchange is beneficial without affecting the correct long-range behavior. 34 While the LSDA exchange functional is helpful to illustrate range-separation in exchange functionals, it is not suitable for most applications due to the homogeneous nature of the functional. A better approach is to use a GGA functional for the density functional exchange component. The B97 functional uses a power series expansion involving the density and its first derivative: 30 where g x is a power series given by with s σ = |∇ρ σ |/ρ 4/3 σ , and where c i x is the ith coefficient. This expansion includes the nonlinear exchange parameter γ x = 0.004, 35 while the linear parameters {c i x } may be determined. In a similar manner to the LSDA exchange functional, the B97 exchange functional may be modified to produce the short-range B97 exchange functional given by: 31 The B97 correlation functional is constructed in a similar way, with separate expansions for the same and opposite spin components such that E B97 c,os = e LSDA c,αβ (ρ α , ρ β )g c,os (s αβ ) d r, for s 2 αβ = (s α +s β )/2. The power series expansions in these expressions are like equation (10), with different linear parameters {c i c,ss } and {c i c,ss }, and non-linear parameters γ c,ss = 0.02 and γ c,os = 0.006 respectively. These were fitted to the correlation energies of He and Ne. 35 The same-and opposite-spin LSDA correlation energy densities per unit volume e LSDA c,σσ , e LSDA c,αβ are extracted from the total LSDA correlation energy density using: The linear coefficients {c i c,ss } and {c i c,os } may be optimized systematically in the same manner as the exchange coefficients.
The UW12 correlation energy can also be split into the same and opposite spin compo- where w ss 12 and w os 12 are the same-and opposite-spin geminal functions respectively, with projectorQ 12 given bŷ for whichô andv are projectors onto the occupied and virtual spaces respectively. Using this relation the UW12 energy for a given geminal may be evaluated either using the virtual space or entirely within the occupied space. In the virtual space, the total UW12 energy may be evaluated for a fixed set of orbitals using equation (1). In the occupied space, the UW12 energy may be written solely in terms of the one-particle reduced density matrix allowing the orbitals to be optimized self-consistently within a generalized Kohn-Sham scheme.
The accuracy of the approximation depends on the choice of geminal functions w os 12 and w ss 12 . In previous work we have set w ss 12 = κw os 12 , with κ = 0.5 in XCH-BLYP-UW12, 25 or κ = 0 in B-LYP-osUW12, where the same-spin contribution is neglected. In these functionals, the opposite-spin geminal has been a one-parameter exponential function: 36 Integrals involving this kernel have been derived, 36 but we instead expand w os 12 as for exponents {γ i } and coefficients {c i u }. For a fixed set of exponents the coefficients are fitted to the exponential function, and in this way the same code may be used to model other kernels.
Rather than constraining this expansion to fit a fixed function, the coefficients could be fitted directly to data, and this forms an integral part of the development described in this paper. In this way the opposite and same spin coefficients could be optimized separately, however, in this work we continue to use the opposite-spin only UW12 (osUW12) approach It has been shown that electronic dispersion forces are crucial for a correct description of long-range electron correlation. [37][38][39] However, most standard XC functionals fail to account for these effects since they use local properties to calculate the XC energy. This lack of dispersion is a significant problem in DFT with a number of solutions suggested. Many modern density-functionals include a dispersion-correction term to account for these effects.
One approach is to use empirical dispersion corrections, such as the D3(BJ) and D4 corrections. 40,41 While another is to create long-range (non-local) density functionals specifically designed to account for these effects, such as the VV10 functional. 32 In this work we use the D3(BJ) correction where C AB (n) are the dispersion coefficients between atoms A, B, with inter-atomic distance R AB , scale factors s n , and Becke-Johnson damping function f (n) damp given by [42][43][44] Which includes functional-specific damping parameters a 0 , a 1 , and R 0 = C AB 8 /C AB 6 . A future possibility would be to re-optimize the final functional to include VV10 rather than D3(BJ). In addition to the non-linear parameters ω, γ x , γ c,ss , γ c,os , γ i , a 1 , and a 2 . rare-gas dimer potential energy curves (RG10). 45,46 The full database is split into three sets for training, validation, and testing, with datasets from each subset distributed among these. We use the training set to fit the functional parameters; it contains 25 datasets and 870 data-points. We use the validation set to determine the whether the fitted functional is transferable; it consists of 35 datasets with 2964 data points. We only use the test set to calculate the accuracy of the final functional; it contains 24 datasets with 1152 data-points. This same splitting was previously used to fit the ωB97M-V and ωB97M(2) functionals. 28,29 The functional ωB97X-osUW12-D3(BJ) consists of multiple parameters; the linear parameters are selected and optimized using the procedure outlined below, while the non-linear parameters are fixed in advance of the optimization. The range-separation parameter ω = 0.3 was chosen as it is the value used in the ωB97X and ωB97X-V functional. 27,31 The B97 parameters γ x , γ c,ss , γ c,os remain fixed using the original B97 values. The non-linear dispersion parameters were chosen as a 1 = 0.00 and a 2 = 5.49. These were taken from the ωB97X- For the linear parameters, we impose the uniform-electron gas limit c 0

Datasets and Fitting
x + c sr x = 1, the longrange exchange limit (c lr x = 1), and the correct leading-order dispersion behavior (s 6 = 1.0). The remaining linear parameters may then be fitted using least-squares regression, with the power series expansions of the B97 functionals all calculated up to fourth order. 48 Fitting is performed using energy components calculated on a fixed set of orbitals, therefore we must choose a functional with which to calculate the initial orbitals. Unlike the optimizations of ωB97X-V and ωB97M-V we chose an established functional to calculate the initial orbitals, namely ωB97X. 31 The fitting procedure starts with the basic functional containing all zeroth order B97 components, the long-and short-range HF exchange, and the D3(BJ) components. This functional is optimized using a (weighted) linear regression on the training set. All possible combinations of additional parameters are added to this initial functional and re-optimized.
The fitted parameters and weighted root-mean-square error (RMSE) for both the training set and validation set are recorded for each combination. The combinations which minimize the error across both these sets are then taken to be the optimal result.
Once the optimal combination of parameters has been chosen, the training set energies are re-evaluated using orbitals calculated with this new fit and the parameters re-fitted to these new energies.
The weighting scheme used is the same as the one used to optimise ωB97M(2). 29 Each data-point in a given dataset is assigned an initial weighting is the total number of data-points in the dataset, and ∆E i is the root-mean-square of the reaction energies in the dataset i. For each data-type, the weights are then normalised by dividing all values by the minimum initial weight w d min , and exponentiated so that they lie in between 1 and 2. During this calculation the AE18 dataset is included in the TCE data-type.
Each data-type is then given a multiplicative weight f d , as shown in table 1. For the RG10 dataset, the set is subdivided the set into attractive and repulsive interactions, with factors f d = 10000 and f d = 1 respectively. 29

Computational Details
All calculations were performed in Entos Qcore unless otherwise noted. 49 Calculations use the Def2-QZVPPD basis set. 50,51 The density-fitting approximation is used to calculate Coulomb, exchange, and UW12 contributions using the Def2-universal-JKFIT basis set. 52 The threeelectron term in the UW12 approximation is calculated using an RI-approximation which utilizes the Def2-QZVP-RI basis set. 53 54 Exchange-correlation contributions were calculated numerically using Neese fixed pruned quadrature grids, 55 56 which use Laqua partitioning. 57 Results for all previously established functionals with the exception of UW12 functionals are taken from Ref. 5.

Results
In order to test the fitting method, the procedure was used to fit an ωB97X-D3(BJ) type hybrid (without UW12). Fitting all possible combinations of parameters and analyzing the weighted error in the validation set results in the best possible functional with 8 linear parameters (see figure 2). When the procedure was repeated with the UW12 components included, the optimal number of parameters was found to be around 15. However, the fitted B97 coefficients for these functional combinations were found to be significantly larger than those present in the non-UW12 fitted functionals. While this effect was significantly reduced if our search was limited to combinations where higher order B97 terms are only included if all lower order terms are already present for each series expansion -where no B97 components were skipped. The coefficients were still significantly greater than in the non-UW12 functional.
To eliminate this problem, it was decided to use regularization on the regression. This was done using ridge regression where an additional parameter α which adds a penalty for the size of coefficients equal to the sum of squares. This method is equivalent to ordinary least-squares regression for α = 0. To determine the optimal value of α, we looked at the weighted RMSE in the validation as a function of α for a single combination at a time.   converged. Table 2 shows the values of the coefficients at each cycle of the optimization. The zeroth cycle corresponds to the original fit using coefficients fitted using energies calculated on ωB97X orbitals. The final converged functional is shown in bold in the final column. We name this functional WM21-D3(BJ).
Throughout this optimization, a majority of the parameters remain mostly unchanged, with the difference between the final and initial values less than 0.05 in all but four cases. The c −4 u value changes by 0.0554, while the first through fourth order same-spin B97 coefficients differ more significantly from the initial values than the other components. The greatest change is seen in the c 1 ss coefficient (0.5958). The large change in these values is the reason for the numerous re-optimization cycles performed on the functional. Also of note is the Table 2: Optimized coefficients for the ωB97X-osUW12-D3(BJ) functional at each stage of the optimization. The first set of coefficients are fitted using energies calculated using the ωB97X functional, subsequent fittings are performed on the training set data obtained using the functional defined in the previous column. The final WM21-D3(BJ) functional parameters are shown in bold.  respectively. This combination of B97 components is slightly different than the one used in ωB97X-V, which does not include c 2 os , including c 1 ss instead, this combination had weighted training and validation set RMSEs of 5.82 kcal mol −1 and 3.96 kcal mol −1 respectively. We refer to this functional as ωB97X21-D3(BJ).
As with WM21-D3(BJ) we re-optimize the coefficients after recalculating the energies in the training set self-consistently and repeating for multiple cycles until converged. Table 3 shows the coefficients for the hybrid functional at each stage of the procedure. Four of the Table 3: Optimized coefficients for the ωB97X21-D3(BJ) functional at each stage of the optimization. The first set of coefficients are fitted using energies calculated using orbitals optimised with the ωB97X functional. Subsequent fittings are performed on the training set data obtained using the functional defined in the previous column. The final functional parameters are shown in bold. coefficients undergo a significant change during the first optimization cycle, with c 1 os , c 2 os , c 0 ss , and c 2 x all changing by more than 0.1, though these have converged after the second iteration. We note that the amount of same spin correlation contribution is greatly reduced by this re-optimization. Also we note that the non-linear parameters for both range-separation and dispersion are not re-optimized with the same ones used for both functionals.
Unlike for the UW12 functional, upon re-optimization, the weighted RMSE of the training set increases compared to the initial orbital version going from 5.49 kcal mol −1 to 6.45 kcal mol −1 in the final functional. A complete summary of the RMSEs for each dataset in the training data at each stage in the optimization of ωB97X21-D3(BJ) is shown in the supporting information. Of the 24 datasets only 9 have reduced errors on re-optimization, though 6 of these errors are reduced by more than 0.1 kcal mol −1 . Large increases of more than 0.1 kcal mol −1 in the RMSE are observed in 10 of the sets, with the greatest increases being for the AE18 atomization energies set, the CRBH20 barrier heights set and the G21EA electron affinity sets which all increase by more than 0.5 kcal mol −1 . The RMSE for the atomization energy set AE18 increases by 2.19 kcal mol −1 , by far the greatest increase. The cause of this increase is most likely due to the small amount of same-spin correlation in the resulting functional, Table 4: Mean absolute errors (MAEs) in kcal mol −1 for each of the datasets in the testing set of MGCDB84. The minimum and maximum errors for each set are highlighted in green and red respectively. with the amount of same-spin correlation greatly reduced in the orbitals compared to the initial ωB97X orbitals.
In order to assess the accuracy of our new functionals we compare the errors for the datasets in the testing set data, this was not utilized during the training and validation of the functionals. We compare to other (range-separated) hybrid functionals, in particular the ωB97X-V functional optimized using the same method. [14][15][16]27,34,40,[58][59][60] Also shown is data from our previous functional B-LYP-osUW12-D3(BJ). Table 4 shows the mean absolute errors for each dataset in the MGCDB84 testing set.
Beginning with the ωB97X21-D3(BJ) functional, it can be easily observed that this func-   As an additional test of this functional, using the same procedure as ωB97X21-D3(BJ) and WM21-D3(BJ), we optimize a double hybrid density functional of the form ωB97X-MP2-D3(BJ). For this we allow the coefficients of opposite-spin and same-spin MP2 to be free parameters. In addition we let the s 6 coefficient in D3(BJ) be free to account for the long-range behavior of MP2. As with ωB97X21-D3(BJ), we use the same non-linear parameters as WM21-D3(BJ) and use ωB97X orbitals to calculate the energies. Following the procedure used previously and taking the combination which minimizes the weighted validation error with the minimum number of parameters without skipping any B97 terms.
The resulting functional is a 12 parameter spin component scaled double hybrid. Unlike the other functionals we do not re-optimize this functional to work with self-consistent orbitals and use ωB97X orbitals for the final functional. Table 5 shows the coefficients in the We compare this functional to the others we have optimized in this paper for the test set data. Figure 4 shows the weighted root-mean-square-errors (wRMSEs) for the testing set data for each of the functionals fitted in this paper, with both the final self-consistent and initial ωB97X orbital versions included for ωB97X21-D3(BJ) and WM21-D3(BJ).
For ωB97X21-D3(BJ), the re-optimization results in an significantly increased error compared to the initial functional fit. This is consistent with what was observed for the training set data. The double hybrid functional ωB97X-MP2-D3(BJ) produces a lower overall error than the hybrid ωB97X21-D3(BJ), including the initial optimized version. However, both versions of WM21-D3(BJ) result in a smaller wRMSE than the double hybrid functional for this set, with the fully-self-consistent WM21-D3(BJ) producing the lowest wRMSE value for these functionals.

Conclusions
We have presented WM21-D3(BJ) -a new range-separated hybrid functional with oppositespin UW12 correlation and D3(BJ) dispersion which we have optimized using the survival of the fittest strategy on the main-group chemistry database (MGCDB84). [27][28][29] We have compared this functional to other range-separated hybrid functionals, including the ωB97X-V functional optimized using the same approach, and shown that for the test set data not used in the optimization procedure WM21-D3(BJ) results in lower errors for the majority of datasets considered without the VV10 non-local functional. The functional also offered significant improvement over the B-LYP-osUW12-D3(BJ) functional for a number of datasets.
We also optimized a standard range-separated hybrid ωB97X21-D3(BJ) using the same approach, this functional produced greater errors than ωB97X-V, despite containing similar components (with the exception of the dispersion component). The WM21-D3(BJ) functional results in much lower errors than ωB97X21-D3(BJ), showcasing the ability of UW12 correlation to improve the accuracy of a calculation.
In addition we compared results for WM21-D3(BJ) to a double hybrid functional ωB97X-MP2-D3(BJ) optimized using the same strategy, but without self-consistent optimization.
We showed that the ωB97X-MP2-D3(BJ) double hybrid fitted to data calculated using ωB97X orbitals produces a greater overall error for the test set data than the initial version of WM21-D3(BJ) calculated in the same way. In addition, by using the self-consistent optimization procedure, the final WM21-D3(BJ) functional results in even lower errors for this set; demonstrating the advantage of the fully self-consistent UW12 approximation.
Overall, UW12 functionals allow increased accuracy compared to standard hybrid func-tionals in a similar manner to double hybrids, but without using the virtual orbitals. In addition, the fully self-consistent nature of UW12 avoids a number of the problems associated with double hybrid functionals. The ability to capture correlation effects neglected by standard DFT functionals using only the occupied orbitals and a simple geminal function is both useful and theoretically interesting.
In future the functional could be re-optimized using the VV10 non-local correlation instead of D3(BJ) as this has been shown to offer great improvement for the standard hybrid ωB97X-V (compared to ωB97X21-D3(BJ)). In addition, following on from ωB97X-V the survival of the fittest strategy has also been used to optimize functionals with meta-GGA contributions such as ωB97M-V and the double hybrid ωB97M(2) -this could be added to a UW12 hybrid to further increase accuracy. In addition, while we have neglected samespin UW12 contributions in this paper, a same-spin UW12 geminal could be optimized in a similar way to the opposite-spin UW12 with a separate geminal function for each spin pairing.

Supporting Information Available
The following files are included for additional information:

Conflict of interest
FRM is Cofounder and CTO of Entos Inc.