Quantification of geometric errors made simple: application to main-group molecular structures

Nearly all electronic structure simulations begin with obtaining approximate geometries, making a systematic quantification of errors in approximate molecular structures of key importance. Recently, the geometric energy offset (GEO) framework based on a single and natural measure for quantifying and analysing these errors has been proposed [J. Phys. Chem. Lett. 2020, 11, 99579964]. An accurate and way less costly approximation to GEO is utilized here to readily quantify errors in main-group structures and analyze them in a chemically intuitive way. The use of semiexperimental geometries as a reference further simplifies the analysis. The analysis reveals new insights into the geometric performance of methods, their rankings, as well as patterns across different classes of methods and basis sets that arise from the analysis.


Introduction and background
The use of electronic structure calculations to rational, guide and support experiments has become a routine in different branches of science. Such practical calculations, be they based on wave-function or density functional theory (DFT) approximations, conflate errors in both approximate molecular geometries and total energies. Thus, a proper analysis of the performance of electronic structure methods requires decoupling these two distinct sources of errors and their separate analysis. The energetic performance of electronic structure methods is often drastically different from their performance for molecular geometries. Methods with comparable energetic performance, for e.g., binding energies of noncovalent systems, can have strikingly different geometric performance for the same systems. 1 Other examples include Hartree-Fock (HF) and the local density approximation (LDA), which give reasonable structures despite their poor energetic performance. Nearly all electronic structure simulations (and beyond) begin with obtaining approximate geometries, making a proper and extensive quantification of geometric errors of key importance in computational chemistry. By using the standard tools for comparing structures, it is not easy to tell which of approximate geometries is better, as it requires comparing errors in 3N − 6 degrees of freedom. What typically happens is that some geometric parameters are more accurate in one method, some are better in another (see, e.g., Refs. 1-3). The geometric performance of electronic structure methods is commonly assessed by comparing averages of errors in these parameters (e.g., bond lengths, angles, distances from a chosen point in a molecule such as the center of mass, etc.) [3][4][5][6][7][8][9] But, when such metrics are used, the rankings of approximations strongly depends on a chosen metric as illustrated in detail in Ref. 1. By comparing errors in geometric parameters, it is not trivial to tell which of the approximations yields an overall better geometry even for systems with two degrees of freedom (e.g., water 1 ), as one approximation can beat the other for the first degree of freedom, but not for the second. Other ambiguities that affect the rankings based on this approach also arise, such as: whether one should average errors over all bond lengths/angles or only unique ones.
In a recent work, Vuckovic and Burke (VB) introduced a framework for quantifying and analysing errors in approximate geometries based on the concept of geometry energy offset (GEO). GEO provides a remarkably simple and intuitive approach for quantifying errors in molecular geometries, which by-passes the need to compare the errors in individual geometric parameters. For a given approximate geometry, GEO is simply defined as the difference in exact energies at the approximate and exact geometries: where E(G) is the ground-state energy at geometry G, G 0 the exact geometry andG an approximate geometry. The theoretically exact geometry is defined here as the exact minimum of the exact ground-state potential energy surface, and the Born-Oppenheimer approximation is assumed throughout this work. Defined this way, GEO represents an energetic distance between the exact and approximate geometries, and thus provides a single number measure for the quality of geometries. At minima, GEO vanishes only ifG = G 0 , and otherwise it is always a positive number given in energy units. Thereby, GEO provides an unambiguous measure for the quality of approximate geometries (the higher the GEO value, the worse the geometry). As such, it circumvents a need for comparing errors in possibly dozens of bond lengths and angles to rank approximations. These features make GEO an ideal quantity for assess-ing the quality of geometries of any molecule and any method so long as the reference geometries are available. Furthermore, GEO can also tell us what fraction of the total error is due to geometry, and what due to total energy. In Appendix A, we show the decomposition of the total error of an electronic structure calculation into geometric and non-geometric part. This decomposition has shown that geometric errors are typically small (but not negligible) part of errors for e.g., atomization energies, but can account for most of the error for weak interactions. 1 It also enabled identification of specific situations where better results are obtained by using less accurate structures due to error cancellations between the geometric and nongeometric part of the total error. 1 Computing GEO by Eq. 1 requires access to the exact geometries and exact energies at approximate geometries. For single-reference systems, CCSD(T) with large enough basis set provide accuracy that rivals experimental geometries, 10 so we can consider them 'exact'. VB used CCSD(T)/aug'-cc-pV5Z 11,12 as a reference to calculate GEO for a set of small molecules that had been earlier optimized by Karton and co-workers. 11 For somewhat larger systems (e.g., aromatics containing two benzene rings), running CCSD(T) geometry optimizations with a large basis set becomes too expensive. For this reason, VB had to rely on B2PLYP as a proxy reference, 13 given its nearly CCSD(T) performance for small molecular structures. In a more recent work, Bakowies and von Lilienfeld extended this analysis to a larger set of small molecules, and even built empirical corrections for the part of atomization energies error that is due to approximate geometries. 14 Here an important question arises: When accurate reference geometries are available, can the use of expensive single point calculations at approximate geometries needed to calculate GEO by Eq. 1 be by-passed? Such calculations make the benchmarking of geometries more expensive, and thus pose the restrictions regarding the molecular size. The answer to this question is yes, as assessing geometric performance of approximations by calculating GEO values can be greatly simplified by introducing its approximation: 1 For covalent systems, GEO is excellently approximated by GEO and provides essentially the same information (e.g., rankings of approximations, the decomposition of the error into contributions from the errors in different structural parameters, etc.) 1 At the same time, computing GEO is way less costly as it does not require running a single-point calculation with a high-level method, such as CCSD(T). Furthermore, if a reference geometry is derived from an experiment, we no longer need CCSD(T) at all, as for GEO we only need approximate energies at both reference and approximate geometries (Eq. 2). In practice this can be done by using accurate geometries as a starting point for a geometry optimization with an approximate method. After the convergence, one can easily calculate GEO from the differences in approximate energies from the first and last iteration of the optimization procedure.
In the present paper, the advantages of GEO over GEO are used for a systematic analysis of geometric performance of approximations for main-group structures. By using GEO in place of GEO in tandem with accurate semiexperimental geometries, 15-17 the analysis does not rely on the expensive CCSD(T) calculations nor on a proxy reference as it was the case in the previous study. 1 While the previous study focused on the GEO analysis at a fixed basis set, here we observe how the changes in a basis set affect the geometric performance of approximations. From this analysis, we find that some of the worst performers at a large basis sets are one of the top performers at a smaller basis due to error cancellations. We dedicate special attention to the geometric performance of different classes of DFT methods, and analyze how it varies with the amount of exact exchange. A harmonic approximation to GEO enables us to directly link and partition GEO into contributions from errors in specific geometric parameters. This analysis reveals different patterns for different classes of DFT approximations and tells us how these patterns change with a basis set size.
The paper is organized as follows. The stage is set in Sec. 2, where the the differences between GEO and GEO are examined by using the HCN molecule as an example. In the same section, the set of accurate semiexperi-mental geometries is validated for the purpose of calculating GEO in the present work. The main results are in the next two sections, with Section 3 focusing on the quantification of geometric errors and trends across approximations, and Section 4 focusing on the GEO analysis (a breakdown of GEO into contributions from different geometric parameters). The last section is devoted to conclusions and discussion on where the GEO approach should prove powerful in the future.
2 Setting the stage

The simple HCN example
To compare GEO with GEO and get a feeling how the errors in geometric parameters translate into GEO, we use the HCN molecule. The CCSD(T)(full) method is taken as a reference with the aug'-cc-pCV5Z basis set [cc-pCV5Z for the hydrogen atom and aug-cc-pCV5Z for the other two atoms 18 ], and (full) indicates that all electrons are included in the correlation treatment.
The right panel uses GEO values to rank various approximation for the HCN structure (black line). Both sets of GEO values, the one that uses values the CCSD(T) structure (magenta) and the one that uses semiexperimental (SE) structure (blue) are virtually indistinguishable from GEO . This is the case even though GEO costs way more to compute than GEO . The use of GEO in place of GEO enables us to bypass the use of expensive CCSD(T) energies to perform the GEO analysis. For GEO we also need reference G 0 (Eq. 2), but the use of SE geometries enables us to completely by-pass input from CCSD(T) and extend the GEO analysis to molecules whose structures cannot be obtained by CCSD(T)(full) within a sufficiently large basis set. In general, GEO is energetically very close to GEO even for inaccurate methods, such as HF, 1 as the curvature ofẼ (G) at the minimum is reasonable even for HF (compare Eqs. 6 and 7).
To see how the errors in geometric parameters of HCN translate to GEO, the GEO contours are plotted in the right panel of Figure 1. This is done by calculating the CCSD(T)(full) potential energy surface around the equilibrium geometry. The molecule is linear and all tested approximations get that right. Thus, in the contour plot we show only errors in the two geometric parameters: the C≡N (x-axis) and C−H (y-axis) bond lengths. The poorest performers are HF, M11-L, M06-HF, whose errors are beyond the shown range. First we note that the SE structure (blue dot) is in an excellent agreement with the reference. The B2PLYP geometry is in a very good agreement with both SE and CCSD(T) structures, and has negligibly small GEO value. It is commonplace to run CCSD(T) calculations with the frozen core (FC) approximation, and if this approximation is turned on for CCSD(T)/aug'-cc-pCV5Z, the structure is still accurate (green dot), but it is off the center. This confirms that apart from a large basis set one also needs all-electron CCSD(T) to rival the accuracy of the SE approach. 10 This also demonstrates the advantage of using the SE structures as running CCSD(T)(full) with a large basis can only be done for very small molecules. When it comes to the DFT approximations in the GEO contour plot, one can observe their clustering, as previously observed for water. 1 Here, the hybrid functionals are clustered together in the first quadrant (too short triple bond and accurate single bond), and metaGGA/GGAs are in the second quadrant (predicting too long bonds with larger errors for the single bond). M06-L 19 is an exception, but as we shall see later, its performance for geometric structures is more in-line with hybrids than with metaGGAs.

A dataset of semiexperimental geometries used for GEO evaluations
In the remainder of this work, reference geometries ('exact' G 0 ones entering Eq. 2) are taken from the B2se dataset of Barone and coworkers. 15 This dataset contains accurate equi-librium structures of 68 molecules that have been obtained from the SE approach. Within the SE approach, equilibrium structures are derived from experimental ground-state rotational constants in which vibrational contributions computed by a suitable quantum mechanical (QM) methods are subtracted. [15][16][17] Thereby, the SE structures match our definition of G 0 , as they correspond to the minima on the Born-Oppenheimer potential energy surface. In terms of applicability and accuracy, the SE approach offers a range of advantages for accurate structure determinations over only experimental or only theoretical approaches (see, e.g., Refs. 15,16). The B2se structures employed here are shown in the SI (see Fig. S1), and the phenyl radical, as the only open-shell B2se species is excluded and later separately analysed in Section 4.4. Barone and co-workers have built the B2se set by using B2PLYP/cc-pVTZ to compute vibrational corrections. 15 This level of theory gives nearly identical structures to those computed from CCSD(T) vibrational corrections. 15 For a subset of the B2se structures, vibrational correction have been computed at the CCSD(T) with at least a triple-ζ basis set and these are contained in the CCse set. 10 To validate the use of the B2se structures as a reference, we calculate GEO values for a range of approximate structures by using both B2se and CCse geometries as a reference. This is done for a subset of the B2se set for which the CCse geometries are available. The two sets of GEO values (B2se vs. CCse as a reference) are essentially the same (typically within 0.005 kcal/mol), as shown in   13 is the winner and HF has expectedly the worst performance. Hybrids typically perform better than semilocal DFT functionals. An exception is M06-L that has an excellent performance, which is more in line with hybrid functionals than with other tested semilocal functionals. On the other hand, M06-HF 20 performs much worse than other hybrids. LDA's GEO is comparable to that of PBE 21 and even better than BLYP. 22,23 This makes the geometric performance of LDA remarkable relative to its poor energetic performance (for, e.g., atomization energies). 21,24 M11-L, 25 as observed earlier by Jacquemin and co-workers, 26 displays very bad performance for molecular geometries and it is here just slightly better than HF.
Moving from the AVQZ to AVTZ panel (middle), the rankings are mostly preserved with some exceptions. B2PLYP is still the winner, HF is still the worst, but MP2 now loses to most of the hybrids. The gap between hybrids and semilocal functionals is now bigger. The exception is M06-L again, which now beats all tested hybrids.
Much more abrupt changes in the rankings are seen as one goes from AVTZ to AVDZ (the right panel): M06-2x 27 is now the best, MP2 is behind LDA and beats only PBE and BLYP, and B2PLYP is now worse than all of the hybrids. Surprisingly, the performance of HF strikingly improved, and it is now in front of all semilocal functionals except for M06-L. In- of that of HF/AVQZ and a half of MP2/AVDZ. This clearly suggest that this surprisingly good geometric performance of HF/AVDZ is due to the error cancellation between the absence of correlation and a large basis set incompleteness error.
To shed more light on the different GEO trends with the basis set size, in Figure 3 we show the GEO boxplots for MP2, B3LYP, M06-2x and HF at the three basis sets (for the same plot for other methods, see Figure S4). Each of these approximations behaves differently. MP2 is a disaster at AVDZ and then it dramatically improves at AVTZ. A significant improvement is also observed as one goes from AVTZ to AVQZ. In the case of B3LYP, the spread is also larger as the basis set decreases but expectedly much less drastic than it is the case with MP2. M06-2X is interesting because its GEO trends change very little with the basis set size. This is also the case with other Minnesota functionals except for MN15-L 28 (see Figure S4, where we show the same boxplots for all tested approximations). HF displays a unique trend here: the spread becomes smaller as the basis set size decreases. This behaviour has also an impact on the GEO performance on hybrids, and in Section 3.4, we study in more detail how the amount of exact exchange affects the geometric performance of hybrids.
Since the inclusion of diffuse functions in a basis set is often too expensive for the geometry optimizations, we also test here what happens with the GEO performance of approximations as one goes from the AVnZ to VnZ basis set. The results are shown in Figure 4, where in each of the panels the ranking is kept of the respective AVnZ basis set. From this figure, we can see that the omission of the diffuse functions does not much affect the rankings of approximations, but it typically increases average GEO . Expectedly, it does more so as one goes from AVQZ to AVDZ. In the case of hybrids, the AVnZ to VnZ change worsens the average GEO by no more than 5% when n=Q or n=T, but it can worsen by ∼40% when n=D.

How does dispersion corrections affect DFT GEO values for the B2se set?
In Figures 2-4, the DFT approximations are employed without empirical dispersion corrections as these have a little effect on the geometric performance given the relatively small size of the B2se molecules. To see the effect of the dispersion correction for the B2se geometries, in Figure 5 we show a parity plot comparing GEO of TPSS (a metaGGA) enhanced by D3(BJ) against that of bare TPSS. D3(BJ) denotes Grimme's dispersion correction 29 with the dumping function of Becke and Johnson. 30,31 The GEO values for BLYP and TPSSh 32 are also shown for comparison. We can see from the plot that the addition of D3(BJ) has almost no effect on the GEO values of TPSS. On the contrary, the effect of exact exchange addition is way more profound as the GEO values of TPSSh, which contains 10% of where,D =G HG , whereH is the Hessian ofẼ(G) at theG minimum.D is here the absolute GEO scale, which by Eq 3 gives us GEO value when the exact geometry is compressed (or expanded) by γ. ues are within 1% of CCSD(T) ones and even HF gives reasonable values. 1 Thus, theD values are always computed here at the B3LYP/AVTZ level, and are reported for the B2se molecules in Figure S1 . As said, the main idea ofD is to give a scale on which GEO does grow with a molecular size. Thus, we define: by equating GEO of Eq. 2 with GEO γ of Eq. 3. In this way, GEO from a given approximation would also be obtained if γ were a relative error in all bond lengths and there were no errors in angles.
If γ is now used in place of GEO to rank the approximations, the rankings are preserved with few exceptions when the differences in numbers are marginal. This is illustrated in Figure 6, where we repeat Figure 2, but with bars denoting mean γ values of approximations, which range from ∼ 0.2% to ∼ 1.7%. If one wants to estimate GEO values for a molecule that is larger than those of B2se, it can be done from: GEO ∼ γ 2D /2, by using the average γ values reported in Figure 6 and K of that specific molecule.

How does geometric perfor-
mance of DFT approximations vary with the amount of exact exchange?
In this section we focus on how the geometric performance of DFT methods varies with their amount of exact exchange. For this purpose, we employ the α-PBE hybrid which is built from the PBE functional by replacing the α amount of PBE exchange with the same amount of exact exchange. At α = 0.25, α-PBE becomes PBE0. 34,35 The mean B2se's GEO values for α-PBE/AVnZ (n={Q,D,T}) are shown in the top panel of Figure 7 (note the log-scale in the y-axis). In terms of the shape of the curves, their ranges, and the position of the minima, AVTZ curve is similar to AVQZ, but very different from the AVDZ. Up to α ∼ 0.3, the mean GEO is lower with AVTZ than with AVDZ. At larger α values (α greater than ∼ 0.3), AVDZ becomes more accurate than AVTZ. This behaviour of α-PBE can be traced back to the geometric performance of HF, given that: (i) α-PBE becomes more similar to HF as α approaches 1, (ii) HF's GEO is much lower with AVDZ than AVTZ. The minima of the AVQZ and AVTZ curves are at α ∼ 0.2, whereas that of the AVDZ curve is expectedly shifted towards larger α (α = 0.36). Around these values we can also find most of the optimal α values for GEO of the individual B2se molecules (see Figure S9). For example, more than 70% of the optimal α values within the AVTZ basis set for the individual B2se structure lie in between 0.17 and 0.24. If we look at the mean γ (in place of mean GEO ) as a function of α, the minima are still at about the same α values (see the inset in the top panel of Figure 7).
The bottom panel of Figure 7 zooms in on the region about the minima of the AVDZ and AVTZ curves, and there we add datapoints of the mean GEO values of functionals containing different amounts of the exact exchange. From there we can see that the performance of TPSSh and B3LYP is nearly the same as that of respective α-PBE. On the other hand, Minnesota  functionals are typically better than the respective α-PBE functional (note that Minnesota functionals are designed so that their exchange and correlation parts fit each other). Only MN15/AVDZ gets beaten by α-PBE within the same amount of exact exchange and basis set, given the very good performance of α-PBE/AVDZ in the region around that α value. Finally, α-PBE at the optimal α = 0.36 value outperforms all functionals with AVDZ. When VDZ is used in place of AVDZ, the curve has nearly the same shape but is shifted upwards by ∼0.1 kcal/mol (see Figure S8). In Figure 8, we also show the GEO boxplots of α-PBE for the B2se at the following α values: 0, 0.21 (the minimum of the AVTZ curve), 0.36 (the minimum of the AVDZ curve), 0.5. From these boxplots, we can see that at the two smaller α values the GEO spread within AVDZ is larger than that of AVTZ & AVQZ, whereas the situation is reversed at the larger two α values. This also suggest that α-PBE at α = 0.36 and with AVDZ provides great geometric performance relative to its cost. For the same plot, but with a full range in the y-axes, see Figure S11.
In view of their excellent performance to cost ratio, the '-3c' family of composite methods developed by Grimme and co-workers is becoming more and more popular. 3,[36][37][38] The family includes: HF-3c, 36 PBEh-3c, 37 B97-3c, 38 and the most recent r 2 -SCAN-3c 3 ('a Swiss army knife'), which is built upon the original r 2 -SCAN of Perdew and co-workers. 39 These methods are characterized by small, but carefully chosen set of atomic basis functions, and classical potentials designed to correct their electronic structure part. Boxplots with GEO values of the four 3c methods for B2se are shown in Figure 9, with B3LYP/AVTZ and B3LYP/AVDZ boxplots shown for comparison. From this figure, we can see that with way larger GEO values, HF-3c stands out from the other three 3c methods. These three 3c methods have similar mean GEO values, with B97-3c having the lowest mean GEO , and r 2 -SCAN-3c having the smallest spread. The three 3c methods are much better than B3LYP/AVDZ, but get beaten by B3LYP/AVTZ. Thus, they overall give highly accurate B2se structures relative to their high efficiency.

GEO analysis: a break-
down of GEO into components from errors in geometric parameters

GEO decomposition
By expanding E(G) around its G 0 minimum up to second order, we obtain the harmonic approximation to GEO: 1 where H is the Hessian of E(G) at the G 0 minimum. We can write GEO in the same way, by expandingẼ(G) around itsG minimum: The difference between GEO h and GEO is defined here as the measure of the anaharmonicty of GEO : which, as we shall see later, is negligibly small for our molecules. Even a further simplification can be obtained if we useH q in a given set of internal coordinates and neglect its off-diagonal elements. Then GEO h becomes: where ∆q i =q i − q are the errors in internal coordinates, andf q i,i are the diagonal elements of theH q Hessian in internal coordinates (force constants). In this way, Eq. 9 enables us to partition GEO into the contributions from the errors in geometric parameters in internal coordinates (bond lengths, bond angles, torsion angles, etc.). We also define: which is a contribution to GEO from the coupling of internal coordinates (due to typically small, but generally non-zero off-diagonal elements ofH q ). Combining Eqs. 8-10, we can write: Here, Eqs. 8-11 apply to GEO , but one can define analogous equations for GEO. As we shall see, the third term on the r.h.s. of Eq. 11 accounts for most of GEO , and this enables its intuitive interpretation in terms of the errors in individual geometric parameters. Instead of using internal coordinates to partition GEO , one can also use the errors in GEO normal modes (the eigenvectors ofH in e.g., Cartesian coordinates). 1 The advantage of the latter analysis is that GEO s becomes exactly equal to GEO , but the errors in these coordinates are less chemically intuitive than errors in e.g., bond lengths. This is why we proceed with the GEO analysis in internal coordinates. To illustrate the quantities of Eq. 11, we take a butadiene molecule as an example. In Figure 10, we rank the approximations within AVQZ based on their GEO s values. In the same plot, we also show |∆ A + ∆ C | and |∆ A | (note the log-scale in the y-axes). From this figure we can see that |∆ A | is negligible relative to GEO s , and that ∆ A + ∆ C sum also accounts only for a small fraction of GEO . This means that most of GEO can be directly linked and partitioned into the contributions from the errors in specific geometric parameters. This breakdown for the same butadiene molecule is shown in Figure 11. Based on Eq. 9, we partition GEO s into contributions from errors in the single and double bond lengths and the remainder of GEO s here is due to the errors in the angles (the sum of contributions from both bond and torsion angles ). We can also see how these GEO s components vary as we change the basis sets: AVQZ (left panel), AVTZ (middle panel), and AVDZ (right panel). The contribution from the angles is small for all approximations and basis sets. The GEO weights do not change substantially as we go from AVQZ to AVTZ as it is the case with the total GEO values. Interesting patterns and grouping of functionals can be observed based on their GEO contributions. At the AVQZ and AVTZ level, we can see that the single-bond contribution strongly dominates old-school (semi)local functionals (LDA, BLYP, PBE, TPSS). The situation is different in the case of hybrids, for most of which a double-bond contributions strongly dominates their GEO . As we go from AVTZ to AVDZ the relative contributions to GEO change strikingly and so do the rankings of approximations. An immediate noticeable change is that the light blue colour becomes more dominant in the AVDZ bars, indicating a more significant contribution from the errors in single-bond lengths. This change in the basis also reverses the trends in hybrids, where the contribution from singlebonds dominates over that of the double-bonds. On the other hand, a single-bond contribution still dominates GEO of PBE, TPSS and BLYP, with the difference that GEO of these functionals within AVDZ bears a substantial doublebond contribution. As observed earlier, the ge- ometric performance of HF gets drastically better as one goes from AVTZ to AVDZ and we can see here that this change in the basis substantially reduces both single-and double-bond contributions to its GEO . At the AVDZ basis, HF gets beaten here only by M06-2X, which also contains a large amount of exact exchange (54%).

Illustrations
The same analysis for more molecules is shown in Figure 12. Here the basis set is fixed (AVTZ), but a more detailed analysis with different basis sets and additional molecules is given in Section S6 of the SI. In the case of acetylene molecule [panel (a)] the patterns are even clearer than it was the case with butadiene. Namely, GEO of LDA, PBE, BLYP and TPSS is almost entirely due to the errors in the triple bond length, whereas GEO of hybrids are almost entirely due to the single bonds. In the case of benzene [panel (b)], the picture is more nuanced: GEO of GGAs, LDA and TPSS is still mostly due the error in the single bond, but has a significant contribution from the errors in the unsaturated C-C bond lengths. By looking at the remaining two panels in Figure 12, the similar patterns can be observed: the angle contributions to GEO s is still small, GEO s of hybrids is dominated by a single bond component, whereas unsaturated bonds dominate GEO s of the PBE/BLYP/TPSS group. The -L Minnesota functionals (ones that make no use of exact exchange) behave differently: the GEO weights of MN15-L are similar to those of the PBE/TPSS/BLYP group, whereas the weights of M06-L and M11-L are more similar to those of hybrids. Of course, if the AVDZ basis is used instead, the trends change, as was described by the butadiene example (see Section S6 for the results for the other basis sets).

4.3
Breakdown of GEO from α-PBE for the formaldehyde molecule We could see in Fig.12 that the GEO weights of PBE and PBE0 are substantially different. To shed more light on that and gain insight into the position of the α minimum for α-PBE in Figure 7, we consider here how the α-PBE GEO components vary for the formaldehyde molecule. This molecule is chosen as its optimal α value is about the same as that for the whole B2se dataset (about 0.2 at the AVTZ basis set). The results are shown in Figure 13. From the top panel, where AVTZ is used, we first note that the GEO curve is accurately described by that of GEO s . Furthermore, the angle contribution to GEO is very small, and thus the GEO black curve is essentially the sum of its single-bonds and double-bond components (the light blue and blue curves). Up to α ∼ 0.15, most of GEO comes from the single bonds, whereas in the region for α values in between ∼ 0. 35  (α ∼ 0.09) than to that of the single-bonds component (α ∼ 0.45), as the latter minimum is shallower. The large distance between the blue and light blue minima do not allow α-PBE to be very accurate for both single-and double-bond of formaldehyde at the same time. Thus, at the α ∼ 0.2 minimum there is some contribution from both bond types. In the bottom panel of Figure 13, we repeat the same plot, but within AVDZ. The situation in this panel is similar to that of the upper panel, with the difference that the position of the minima of the four curves are now shifted towards larger α values (only the angle between the bonds is most accurate at α = 1 for both basis sets). This observation is in-line with the earlier analysis that focused on the changes in the total GEO values as one goes from AVTZ to AVDZ.

GEO analysis for the phenyl radical
As said earlier, the phenyl radical is the only open-shell species in the B2se set. That is why it was excluded from the GEO statistics given in Section 3, and we analyse it separately here. Figure 14 tween these two plots is the position of MP2, which was one of the top performers for benzene, whereas it stands out as by far the worst for the phenyl radical. The MP2 calculation here is based on the unrestricted HF (UHF), which is severely spin-contaminated ( S 2 off by more than 50% for the considered radical), and in such cases MP2 typically gives very bad geometries. 1,40,41 Although spin-contaminated, UHF itself gives way more reasonable geometry for the phenyl radical than MP2 [Figure 14(a)]. The inset of Figure 14(a) explores GEO values for different methods built upon MP2. It is of no surprise 42 that GEO of SCS-MP2 40 is just slightly lower than that of MP2, as SCS-MP2 also uses the UHF orbitals here. On the other hand, B2PLYP and the two orbital-optimized (OO) MP2 approaches (OO-MP2 and OO-SCS-MP2 43-45 ) reduce spin contamination of MP2, and thus give highly accurate structures for the phenyl radical (GEO values ∼0.05 kcal/mol). One should also note that B2PLYP is the cheapest of the three methods as its cost is about the same as MP2, while orbital optimizations make the two OO-MP2 approaches substantially more expensive than MP2.
In Figure 14(b), we decompose GEO s into the contributions from C−H and C −−− − C bonds, and angles. These results for the phenyl radical can also be compared with those for the benzene molecule [ Figure 12(b)]. From this plot, we can see that large GEO of MP2 comes almost exclusively from the errors in the C −−− − C bond lengths. Instead of summing contributions for the same bond types, we can also measure the GEO contributions from each bond. While a more detailed analysis is shown in the SI (Fig. S34), we focus on MP2 in Figure 12(c), where for each unique bond length we show errors in pm. We can also see how each of these translate to GEO contributions, which are obtained by squaring the error and multiplying it by half of the underlying force constant (Eq. 9). We can see that the MP2 errors in C −−− − C bond lengths are very large and range from ∼2.5pm to ∼3pm resulting in GEO contributions from 0.3 to 0.5 kcal/mol per bond. The errors in the C−H bond lengths are much smaller, and result in way smaller GEO s contributions given that: GEO grows quadratically with the errors in geometric parameters and given the smaller value of the f C−H than f C −−− − C force constant. In contrast to the large MP2 errors in C −−− − C bonds, those of B2PLYP are much smaller (within 0.16 pm resulting in negligible GEO contributions, ∼0.001 kcal/mol).

Conclusion and outlook
The present paper demonstrates the usability of GEO for quantifying and analysing geometric errors in approximate molecular structures. The use of GEO in place of GEO in tandem with semiexperimental geometries greatly simplifies the whole analysis and enables us to bypass a need for using the input from expensive correlated wavefunction calculations, e.g., CCSD(T). With the GEO analysis, we identify patterns in geometric performance across different classes of approximations and basis sets. The focus here is on main-group structures, but the developed tools are widely applicable and can be used in a straightforward way to quantify and analyse geometric errors for any molecule and any approximation in simple and chemically intuitive terms.
Both GEO and GEO use energy units to assess qualities of approximate geometries (unlike other measures for geometric errors, such as averages in errors in geometric parameters). Several advantages arise from that. First, GEO and GEO can be directly compared to existing energetic scores of electronic structure methods or can be included in new ones. For this reason, we recommend the inclusion of mean GEO values for the B2se set to new versions of energetic scores, such as WTMAD-2 scores pertaining to the GMTKN55 collection of databases. 46,47 Sec-ond, mean GEO for B2se can also be included in the training of new empirical methods, since the resulting methods would likely have better geometric performance than those using the same form but trained only on standard energetic datasets (e.g., datasets with atomization energies, barrier heights, binding energies, etc.) 6,48 Here the focus is on ground-state structures, but in the future work GEO will also be calculated for excited-state structures providing tests for TD-DFT and wave-function methods. 26,49 The same or slightly adjusted analysis will also be applied to noncovalent structures, 5,50 transition states (enabling quantification of geometric errors for barrier heights), as well as the structures of large transition-metal complexes obtained from semiempirical methods, for which DFT structures should be sufficiently good reference. 51 In the context of DFT, improved densities should yield improved geometries. 52 Thus, in addition to the standard DFT, we will also test the geometric performance of its densitycorrected variant. 48,53-55

Associated content
The Supporting Information is available free of charge at..., and it contains computational details and additional results references through-out the main text (PDF). Data for GEO values of approximations for the B2se molecules are also provided (xlsx).
Acknowledgement This project has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Sk lodowska-Curie grant agreement 101033630, and in part by NSF (CHEM 1856165). Computational resources for the present project have been provided through NWO's Vici grant 724.017.001. I also thank Dr. Peter Kraus for suggesting the use of semiexperimental geometries in the context of the present work.

A Further details on GEO and GEO
For an approximate electronic structure method, the total error is given by: and thus contains errors both due to the approximate geometry and approximate energy (see Section 1 for the definition of the quantities in Eq 12). To decompose this error into GEO and non-geometric part ('purely energetic components', denoted by P and P below), we add and subtract E G to the r.h.s of Eq. 12: (13) Adding and subtractingẼ (G 0 ) to the r.h.s of Eq. 12 we obtain an alternative form of Eq. 13: The signs of GEO and GEO also dictate the following chain of inequalities: Since GEO is typically very accurately approx-imated by GEO , then we also have: GEO ≈ 1 2 (P − P ). As discussed in Ref. 1, for equilibrium structures (minima of potential energy surfaces), GEO and GEO are always positive, whereas their signs for transition states (the first order saddle points of potential energy surfaces) are not definite.