Setting up the HyDRA blind challenge for the microhydration of organic molecules

The procedure leading to the ﬁrst HyDRA blind challenge for the prediction of water donor stretching vibrations in monohydrates of organic molecules is described. A training set of 10 monohydrates with experimentally known and published water donor vibrations is presented and a test set of 10 monohydrates with unknown or unpublished water donor vibrational wavenumbers is described together with relevant background literature. The rules for data submissions from computational chemistry groups are outlined and the planned publication procedure after the end of the blind challenge is discussed. and the wavenumber at 3657 − 1 should be predicted. If a three-quantum resonance is predicted, estimated deperturbed water OH b wavenumbers removal of this resonance may also be provided. For n (OH) > 2 (alcohols), we recommend that the theory groups report all OH stretching fundamentals, because there may be mode between the hydrogen-bonded alcoholic and water OH bonds. Alternatively, we recommend to report the mode which shows the strongest downshift when H 216 O is replaced by H 218

chemistry groups are outlined and the planned publication procedure after the end of the blind challenge is discussed. tend to cancel for good reasons 8 , paving the way for helpful approximations. The properties of monoisomeric systems are valuable for a first calibration of absolute quantities, which are typically more challenging because they require highly accurate treatments of electronic structure and nuclear dynamics effects. It is rewarding that there are numerous rotational spectroscopy studies of monohydrates with more or less direct structural and sometimes relative energy information available in the literature 5,[9][10][11] . These provide a lot of detail on the shallow potential energy landscapes of solvation phenomena.
A currently underexplored proxy to the hydrogen bond structure and energy of microhydrates is the wavenumber of the hydrogen-bonded OH stretching vibrationν OH b of any docking water serving as a hydrogen bond donor 12 . As a single number for a given monohydrate complex which can be derived from vibrational spectroscopy in a straightforward way, it offers a set of advantages over more demanding structural studies. For multifunctional substrates, linear IR and Raman spectroscopy can provide an immediate survey over competing hydrogen bond docking sites due to the simplicity and intensity of the vibrational fingerprint of the OH group 13,14 .
Double resonance techniques allow for rigorous size and conformational distinction 15 , relaxation experiments can identify the most stable conformation 16 , and 18 O labelling of water can unambiguously discriminate from other hydride stretching modes in the vicinity 17 . The spectral position of the OH stretching vibration is very sensitive to the strength of the hydrogen bond, spanning about two orders of magnitude in relation to the typical spectral width of an isolated vibrational transition.
A theoretical model which successfully predicts such a single observable number likeν OH b is likely to initially get the right answer for the wrong reason, as multidimensional anharmonicity is not easy to model rigorously. This is particularly true if the experimental answer is known beforehand, due to a multitude of available and conceivable model variants and parameters. Therefore, large training sets, blind test components and iterative refinements are essential ingredients on the path towards systematic success for the modeling of OH stretching wavenumbers of monohydrates. Even a largely empirical model with a high success rate for a selected observable can be useful for the experimental gas phase cluster community. This is still close to the current status, where each observable and sometimes even research group has its favourite recipes (functionals, basis sets, scaling factors) to assist experiment. There may be more or less systematic correlations between the observed wavenumber shifts and experimental or computed proton affinities and gas phase basicities [18][19][20] or calculated hydrogen bond lengths 21 , to name just a few possible links to be exploited. Ultimately, one must hope for a number of systematically successful models which connect theory and experiment on multiple observables, not only the vibrational wavenumber. Meaningful approximations such as elaborate scaling techniques 22,23 , local mode models 24,25 , parameterised harmonic DFT predictions 26 , polarisable force fields 27 or hierarchical models building on high level treatments of the smallest systems or subgroups 28 may be identified or we may even witness the victory of machine-learning approaches 29 towards such challenges.
While blind challenges have become popular in different chemical subcommunities such as protein 30 and crystal structure prediction 31 or physicochemical data for drug molecules in solution 32-35 , we are not aware of a blind challenge explicitly addressing vibrational spectroscopy data as a rigorous connecting point between theory and experiment. This may also be due to the fact that this connection relies on challenging electronic structure and vibrational anharmonicity issues at the same time. However, this is also subtly true for observables like molecular structure 36 and energy differences 37 , and therefore it appears timely to investigate a more bold case.
The HyDRA challenge is meant to kick off such a systematic approach to microhydration, with numerous conceivable follow-up options. HyDRA stands for Hydrate Donor Redshift Anticipation, where redshift refers to the wavenumber downshift of the donor OH vibration relative to the symmetric stretching fundamental of the water molecule 38,39 at 3657 cm −1 . The switch between absolute values and relative trends is thus trivial in this case, because it only involves adding or subtracting this experimental monomer value. In the present work, we use the sign convention that a positive downshift describes a lower wavenumber in the complex than in monomeric water. Here, we outline the construction of an initial training and test set for neutral organic monohydrates which was recently offered to theory groups 40  Further mandatory entries of every submission included the proposing researcher and institution (not necessarily from the group publishing the experimental data) as well as the spectroscopic detection, wavenumber calibration, and cooling techniques used. Concerning the observed wavenumber, the submitter had to estimate its accuracy, to predict the relevant docking site for the observed water molecule and to assess whether the monohydrate was the global minimum arrangement of the two complex partners.
Optional entries for further assessment and use of the submissions beyond the initial training included supporting researchers like coworkers and endorsing colleagues as well as spectroscopic data on the free (or otherwise engaged) OH stretching vibration of the hydrate water (ν OH f ), the wavenumber and relative intensity of a potential resonance partner to ν OH b (possibly stealing intensity and shifting its band center), and any isotope substitution information or information on observed docking isomers.
Because it is essential for the challenge that the training set data are well secured, it was possible to list additional assignment measures concerning the organic molecule conformation, the docking site, and the size assignment to a monohydrate, as well as to comment on any other relevant issue.
Based on more than a dozen submissions from half a dozen countries over half a year, a small committee consisting of a PhD student, an undergraduate research student and two experimental spectroscopy group leaders made a selection of 10 systems to be recommended to the theory community for the training of their models. Among others, methanol 14 was not included into the core training set because of its very large amplitude internal motion and a capped phenylalanine 41 The 10 systems selected for the HyDRA training set are shown in Figure 1 together with the three-letter acronym used in this work and with their CAS registry number. Each of them fulfills a number of desirable properties for this initial training process ( Table 1). All are believed to have water acting as a hydrogen bond donor in the global minimum structure, all contain less than 100 electrons to facilitate accurate quantum-chemical modeling. All except APH (where IR spectroscopy was required to detect the second isomer 42,45 ) have only one experimentally observed monohydrate conformation. For all except IMZ, the free OH stretching wavenumber of the solvating water is also known experimentally. All except for the methyl ketones ACE and APH with their low barrier methyl torsions are fairly stiff, with no monomer fundamental wavenumber significantly below 30 cm −1 . For most monohydrates of the training set, there is more than one spectroscopic study in the literature, and in a few cases (ACE, IMZ) there are also isotope substitution experiments. In particular, many monohydrates (APH, ANL, CBU, DBF, IMZ) have been characterised structurally by rotational spectroscopy [45][46][47][48][49] . In two cases (ACE, APH), there is a well-established vibrational b2lib resonance 42 (a resonance of the water OH stretch with the bending overtone (b2) and a hydrogen bond librational (lib) mode) which makes these systems particularly interesting for advanced anharmonic models, but models not including such a resonance can also be applied, because we provide the deperturbed OH stretching wavenumber as well. In two other cases (ANL, DBN), there may be an analogous resonance, but this remains experimentally open and therefore the deperturbed value for OH b spans the experimentally observed main transition. To analyse such resonances, it is essential to avoid embedding of the monohydrates in matrices, because matrices may also induce site splittings which are difficult to distinguish from a resonance 50 .
In terms of experimental techniques, 5 monohydrates were characterised by UV/IR double resonance techniques (in different laboratories in France, Germany, Japan and Spain) and 6 by FTIR spectroscopy (in Germany), all in supersonic jet expansions. 6 compounds are aromatic, 1 is a radical, 3 are ketones, 2 are alcohols. The hydrogen bond acceptor atoms include O, F, N and π-systems.  acceptor molecule abbreviation CAS RNν

Experimental test set
For the experimental test set of vibrationally uncharacterised monohydrates, a multifacetted approach was chosen. Because their experimental data have to be secured and kept secret until the end of the blind challenge, some of the considered systems were already experimentally characterised in the research groups forming the selection committee, but not yet published.
Others were at least pre-explored to increase the likelihood of a successful assignment during the runtime of the blind challenge. Furthermore, an informal call to microwave spectroscopy groups was made to collect suggestions for systems which had already been characterised structurally, to minimise the risk that a subsequent vibrational characterisation meets unexpected difficulties.
This approach led to a preselection of 15 monohydrate proposals, for which there is no gas phase vibrational spectroscopy record in the literature. From those, the final selection of 10 official test systems was made one month after the publication of the training set, by the same selection committee. These 10 target systems for the blind challenge are summarised in Figure 2 (ordered by number of electrons) and Table 3 (in alphabetical sequence).
Each target system, ordered by increasing number of electrons and thus computational complexity, will be briefly discussed in the following to introduce the most relevant literature sources for structural and other information.
For FAH as the simplest molecule in the test set, it is long established by theory [74][75][76] and experiment 62 that water coordinates asymmetrically. The OH stretching vibration which is of interest in this challenge has been determined in matrix isolation 61,77 , but to the best of our knowledge not in the gas phase without environmental influence. In cryogenic matrices, split signals were found (3580, 3585 cm −1 in neon and argon, 3573, 3578 cm −1 in a nitrogen matrix).
These can give a first orientation, but the vacuum-isolated transition may be located above, below or in this range and the splitting may be lifted if it is caused by matrix interaction. Whether the complex is planar or quasiplanar is difficult to determine with the available experimental data 62 , but we expect to provide a single experimental vacuum-isolated water OH b stretching In contrast to the THF situation, the monohydrate of THT has been structurally characterised in much detail 70 , also with respect to the water orientation relative to the ring. We are not aware of a complementary vibrational gas phase study and thus consider this to be a valuable diversification of the test set, also in preparation for more challenging thio compounds 83 . In TFE, as in many fluorinated alcohols 72 , the first solvating water acts primarily as a hydrogen bond acceptor, but the symmetric stretching mode is only subtly shifted from the water monomer value by this interaction and by an additional donor contact to the CF 3 group. This provides valuable benchmark information for small complexation shifts, but the IR activity of such weakly perturbed symmetric OH stretching motions in water molecules is low. Therefore, Raman spectroscopy 14 is expected to provide key vibrational information in this case. Structurally, the preference of the monohydrate for an insertion complex is well established 73 and the related, structurally more diverse case of hexafluoroisopropanol is also well studied 84 . By including the simplest model of a peptide co-solvent 85 monohydrate into the vibrational target systems, the characterisation of their unusual biochemical properties can be supported 86 .
For MLA with its weak internal hydrogen bond between the hydroxy group and (preferentially) the carbonyl group, a solvent water has different options to coordinate. It can insert into the internal hydrogen bond, or add to either end, or even to the ester oxygen. Due to the activation barrier for the first process, jet experiments have to make sure that insertion is not kinetically hindered 87 . This is important, because theory predicts insertion to be most stable. In this context, it is essential that there is experimental microwave evidence for the inserted structure 63 , in which the solute also directs the dangling water hydrogen to one side.
The competition from lactate conformations with the carbonyl group pointing away from the hydroxy group is intense in aqueous solution 64 , but not yet in the monohydrate. In contrast to jet studies of the methanol complex 63 and matrix studies of the ammonia complex 88 , we are not aware of matrix isolation or jet studies of the OH stretching vibration of the monohydrate.
For the urea derivative DMI, a solution study 59 has revealed broad absorption maxima of the monohydrate in 1,2-dichloroethane (3431 cm −1 ) and CCl 4 (3460 cm −1 ) solution, downshifted by 162 and 155 cm −1 , respectively, from the monomer symmetric OH stretch of water in the same solution. We are not aware of structural data for the monohydrate in the gas phase, but there is evidence for involvement of the C=O group in the hydrogen bond 59 and the DMI monomer has been characterised by rotational spectroscopy 60 .
Monomeric CON has been structurally characterised by microwave jet spectroscopy and shown to exist in a dominant boat-chair conformation with non-equivalent lone pairs of the oxygen 57 . The energy difference to the next conformation is so large that monohydration is unlikely to switch the energy sequence. Therefore up to two isomers of the monohydrate are expected 58 and the more stable one is of interest in the present challenge. As the energy balance between the two docking isomers could be subtle, we recommend to calculate both and to provide at least the donor OH stretching wavenumber of the more stable one. We are not aware of any vibrational study of the monohydrate of CON.
TPH is an unsymmetric ketone, in which the water molecule is strongly directed to one of the C=O lone electron pairs. Its monohydrate was recently accurately characterised by rotational spectroscopy and theory 71 but we are not aware of a published gas phase vibrational study.
Comparison to the well-studied non-fluorinated parent compound APH from the training set will reveal the influence of fluorination through space and through several bonds.
PCD as the most complex member of the test set is of stereochemical and hydrogen bond topological interest. Note that the two OH groups are cis-configured, whereas the absolute chirality is not relevant for the monohydrate. The diol forms two weak, but cooperative hydrogen bonds (OH-OH and OH-π) which give rise to separate OH stretching transitions in CCl 4 solution 65 . The water in the monohydrate has several options to attach or insert into this hydrogen bond chain and the most stable conformation is sought. In this particular case, we recommend an extensive search among the possible conformations to identify the global minimum conformation. It will be characterised by IR/UV spectroscopy and isotope labelling. We are not aware of previous gas phase work on this system.

Description of experimental procedures
All experimental data of this blind challenge refer to translationally and rotationally cold molecules (T < 20 K) without any significant environment, but with some residual vibrational excitation in soft modes from the initial thermal population. There is the possibility of conformational trapping behind barriers, which may partially avoid relaxation to the global minimum structure. This is the typical situation in adiabatic carrier gas expansion experiments into vacuum, either free 91 or skimmed to form a molecular beam 92 , with a starting temperature of solutes are chosen such that relaxation into the global minimum structure was typically proven experimentally, or is at least likely. Constitutional relaxation is evidently not considered. Thus, the decomposition of formaldehyde into CO and H 2 is disregarded for the global minimum criterion, although it is exothermic by about 9 kJ mol −1 at low temperature 93 . The solutes always remain in the constitution given in Figure 2.
The spectroscopic methods used to probe the monohydrates may be distinguished into action and linear techniques. In action techniques, the effect of the photons on the monohydrates is probed, e.g. ionisation, dissociation or fluorescence. In linear techniques, the effect of the monohydrates on the photons is probed, e.g. direct infrared absorption 13 or inelastic (Raman) scattering 85 . Action techniques often offer much higher sensitivity, they can be size-selective and conformationally selective 15,94-96 .

Challenges and experimental difficulties
Linear techniques probe all the molecular systems present in the expansion, based on their photon absorption or scattering cross sections. They can usually identify size (e.g. hydration number) by variation of the expansion conditions and by following the signal evolution, at least for 1:1 complexes if those do not spectrally overlap with other cluster compositions. Conformations can only be distinguished if they differ in their spectral fingerprint, sometimes with the help of isotope substitution. There is a less liberal molecular size limit for linear techniques due to the decreasing vapour pressure and the limited sensitivity. Action techniques may in turn suffer from unwanted competing processes such as fragmentation and may still be problematic for many non-aromatic systems and even some aromatic systems with fast processes in the electronically excited state 97,98 . The best is always a combination of both techniques, such as in the case of POH from the training set. This is, however, unrealistic on the time scale of this challenge and for the choice of molecules. It is anticipated that one member of the test set (PCD) will only be characterised by an action technique (IR/UV double resonance), whereas the others will be mostly accessed by direct infrared absorption. For the latter, a recent instrumental development 99 is very helpful for monohydrates, because its gas recycling allows for the use of expensive isotopologues, solutes and carrier gases. In some cases, where the water vibration of interest has a low infrared intensity or may be distributed over several OH stretching modes in the complex, linear Raman spectroscopy 14,85 can provide the required information.
Aggravated by the partial double-blind character of the challenge, it can not be strictly ruled out that a training or test set member has to be removed completely from the challenge, because experiments or their interpretations reveal a major problem. Such problems may include the discovery of new monohydrate or even monomer conformations which complicate an unambiguous assignment 37 , issues with the cluster size assignment, experimental downtimes due to instrument failure, commercial availability bottlenecks, etc.. However, we expect that such problems do not exceed a 10% threshold and thus do not significantly narrow the significance of the challenge.

.1 Theory group involvement
Theory groups and other researchers interested in the systematic prediction of OH stretching wavenumbers for monohydrates were alerted about this blind challenge via news groups, conferences, and direct contact, together with the training set of 10 organic solvates. They were given the possibility to register for the challenge, to be alerted immediately upon the start of the actual competition with the announcement of the test set. Registrations up to the submission deadline are accepted. A unique entry code is provided to each participant in order to allow them later to review their own submitted data.
Although it does not stand as a strict rule, the calculations for all molecules in both sets (test and training) should be carried out as consistently as possible. This would mean that the level of theory at all different stages (conformer search, optimisation, calculation of frequencies, etc...) should be the same. The same also applies to non-structure based methods. For example, if a regression is used, the same model should be applied to all systems. However, it is expectable that some parameters of the protocol might change. For example, in some systems conformational searches might be skipped altogether, or carry different parameters due to the range of molecular sizes. It is also possible that some geometry optimisation criteria will be adapted. These computational details are expected to be provided at the end of the challenge.
Assistance to all the participating groups is provided throughout the challenge duration, strictly avoiding any direct contact between the theory and the experimental groups, in the true spirit of a double-blind challenge.
One cannot rule out that we have overlooked published cold gas phase vibrational work on a system in the test set, or that such work appears on the scientific record before the experimental HyDRA data are released after the end of the challenge. This could be due to activities of other experimental groups not involved in the challenge, or due to data leakage from our experimental groups despite substantial effort to keep the results confidential among the contributing experimentalists. In such a case, the affected test set member will be shifted to training set status, because one cannot rule out that some model-proposing participants had a knowledge advantage over others. The goal is clearly to keep the blind character as unquestionable as possible 32 .

Mandatory data submission
The mandatory part of the submissions for this blind challenge will be compact. For each of the training and test set members, a single predicted wavenumber for the hydrogen-bonded water OH stretching mode in the most stable monohydrate in cm −1 must be provided in a standardised online form. We strongly recommend to provide together with this number an error estimate. This can be based on an actual statistical analysis (e.g., from the errors observed in the training systems) or even from personal convictions. Each participant group has the freedom to select which systems to compute (and ultimately submit data). In other words, all submissions will be accepted even if incomplete.
This compact table of up to 20 numbers (plus uncertainties) must be accompanied by a detailed description of the rational prediction method employed, to be included as a separate, autonomous file in the supplementary information of the planned publication. The description must contain all the details required for the reproduction of the results, such as the employed methods and program packages, with suitable references. Scaling parameters, employed atomic masses and any other parameters and computational keywords required for reproduction must be provided. This document will be in the full responsibility of the submitting authors and should include all the names and affiliations of the contributing co-authors as well as the role of each co-author in the submission. The use of ORCID ids is strongly recommended.

Optional data submission
Optional parts of the submission can be, in the sequence of decreasing importance for the goals of this blind challenge, and in free format to be included in the supplementary information: We recommend to always use the explicit isotopic masses for best method comparison 100 . Some programs use masses averaged over natural isotopic composition by default. This may have relatively small consequences for the current low resolution vibrational spectra and systems (still up to 5 cm −1 for vibrations involving sulfur due to a ratio of 1.003 between the natural atomic mass and the main isotope mass), but is clearly detrimental when comparing to high resolution (rotational) spectroscopy in the future. are not considered. Geometries, when available, may be provided.
After the official submission deadline for model contributions, there will be a limited time window for the correction of submissions. Once this window is closed, the raw experimental data will also be made publicly available on a data repository, such that the challenge is closed on both ends.

Evaluation of the blind challenge
The submitted OH stretching data will be carefully evaluated and compared with the experimental values in different ways. Correlation plots for training, test and combined sets will be produced and analysed for all complete submissions, with a possible focus on which set is predicted better. Where several incomplete submissions focus on the most simple systems, which is to be expected for high-level theoretical treatments, analysis of such subsets with benchmark character will be attempted.

Publication of the results
The results will be published in a relatively compact many-author manuscript with extensive supplementary information in the responsibility of separate experimental and computational author subgroups, planned to be part of a PCCP themed collection on benchmarking in 2022.
The final selection of the methods and results included will be subject to different criteria including diversity of methods, sophistication of the approach and total number of 'coincident' submissions. Our expectation is, however, that all submissions will be considered. The submission of the data does not change the ownership of the data, the latter remains exclusive to the participating group. Any participants whose results are not included are still welcome to submit their work independently. Participants selected to participate in the joint paper are expected to respond to two rounds of manuscript reviews and to remove inconsistencies and gaps. Failure to reply within a two-week period effectively confirming co-authoring could result in exclusion from the joint publication. The publication should be viewed as the starting point for numerous follow-up activities by the individual groups, such as completing or improving their predictions in a non-blind fashion. Groups which did not meet the submission deadline are invited to present their results independently as well. Some experimental assignments may have to be revisited in the light of systematically conflicting predictions. Optional data provided by the computational approaches may help to better understand the spectra. The double-sided theory-experiment interplay will continue, as usual in the field. All these activities profit from a fast publication of the primary blind challenge results.

Post-publication activities
After publication of the results of the HyDRA blind challenge, the training and test sets can be extended to more strongly and even more weakly bound monohydrates, to charged systems (where more powerful experimental techniques are available [101][102][103], to dihydrates, to isotope effects, to other vibrations, or to further observables such as structures and relative energies.
These extensions may be targeted by different groups and will hopefully lead to a continuous refinement of the theoretical modelling of the microhydration of organic matter, identifying the most powerful and universal models in the field.
There are indeed many interesting monohydrates which were not considered for the present challenge for various reasons. This includes systems where the global minimum structure is hidden behind a large monomer isomerisation barrier 104 , and some systems which are closely related to members of the training or test set 42,105 . It also includes the numerous systems where the global minimum structure involves water as a pure acceptor. In terms of acceptor elements beyond O and N, there are many extension possibilities towards S 83,106,107 which remain to be explored. For the structurally well characterised acid hydrates 108,109 , one may expect complex vibrational dynamics whenever the OH stretching excitation comes close to degenerate proton transfer barriers.

Conclusions
We have described a first systematic blind challenge initiative in the field of hydrogen bond- We see this direct match between electronic structure theory and vibrational spectroscopy as an important complement to more traditional and well-established blind challenges in the field of protein 110 or crystal structure 111 prediction, where subtle energy differences also relate to zero point or thermal vibrational energy and thus anharmonicity treatments, besides getting the potential energy hypersurfaces right. This is even more obvious for the field of room temperature hydration 112 , where an atomistic method which systematically predicts correct free energy differences for a good reason is likely to predict reliable vibrational shifts of monohydrates as well. However, there is a long path of convergence between rigorously atomistic and other, more empirical or approximate or machine-learning models which are more easily extendable to macroscopic samples. Clearly, there is plenty of room in between. Blind challenges may be among the most objective procedures to find out where we stand.