SAMPL7 Challenge Overview: Assessing the Reliability of Polarizable and Non-Polarizable Methods for Host-Guest Binding Free Energy Calculations

Reliability The SAMPL challenges focus on testing and driving progress of computational methods to help guide pharmaceutical drug discovery. However, assessment of methods for predicting binding affinities is often hampered by computational challenges such as conformational sampling, protonation state uncertainties, variation in test sets selected, and even lack of high quality experimental data. SAMPL blind challenges have thus frequently included a component focusing on host-guest binding, which removes some of these challenges while still focusing on molecular recognition. Here, we report on the results of the SAMPL7 blind prediction challenge for host-guest affinity prediction. In this study, we focused on three different host-guest categories -- a familiar deep cavity cavitand series which has been featured in several prior challenges (where we examine binding of a series of guests to two hosts), a new series of cyclodextrin derivatives which are monofunctionalized around the rim to add amino acid-like functionality (where we examine binding of a two guests to a series of hosts), and binding of a series of guests to a new acyclic TrimerTrip host which is related to previous cucurbituril hosts. Many predictions used methods based on molecular simulations, and overall success was mixed, though several methods stood out. As in SAMPL6, we find that one strategy for achieving reasonable accuracy here was to make empirical corrections to binding predictions based on previous data for host categories which have been studied well before, though this can be of limited value when new systems are included. Additionally, we found that methods using the Abstract The SAMPL challenges focus on testing and driving progress of computational methods to help guide pharmaceu- 11 tical drug discovery. However, assessment of methods for predicting binding aﬃnities is often hampered by computational 12 challenges such as conformational sampling, protonation state uncertainties, variation in test sets selected, and even lack of 13 high quality experimental data. SAMPL blind challenges have thus frequently included a component focusing on host-guest 14 binding, which removes some of these challenges while still focusing on molecular recognition. Here, we report on the results of 15 the SAMPL7 blind prediction challenge for host-guest aﬃnity prediction. In this study, we focused on three diﬀerent host-guest 16 categories – a familiar deep cavity cavitand series which has been featured in several prior challenges (where we examine bind- 17 ing of a series of guests to two hosts), a new series of cyclodextrin derivatives which are monofunctionalized around the rim 18 to add amino acid-like functionality (where we examine binding of a two guests to a series of hosts), and binding of a series of 19 guests to a new acyclic TrimerTrip host which is related to previous cucurbituril hosts. Many predictions used methods based on 20 molecular simulations, and overall success was mixed, though several methods stood out. As in SAMPL6, we ﬁnd that one strat- 21 egy for achieving reasonable accuracy here was to make empirical corrections to binding predictions based on previous data for 22 host categories which have been studied well before, though this can be of limited value when new systems are included. Addi- 23 tionally, we found that methods using the AMOEBA polarizable force ﬁeld had considerable success for the two host categories 24 in which they participated. The new TrimerTrip system was also found to introduce some sampling problems, because multiple 25 conformations may be relevant to binding and interconvert only slowly. Overall, results in this challenge tentatively suggest that 26 further investigation of polarizable force ﬁelds for these challenges may be warranted. 27

can be particularly affected by polarization. Additionally, anions such as iodide and bromide are highly polarizable, including 180 anions with phosphate or sulfate moieties which are present in a wide range of biomolecules [30,31]. Phosphates and sulfates 181 play important roles in biological functions, interactions, and are present in drug-like molecules [31]. On the other hand, small 182 cations have low polarizability but can still strongly polarize their environment when it is polarizable. 183 Much molecular modeling uses classical fixed-charge force fields without an explicit accounting for polarization [31]. Such 184 two-body additive force fields are implicitly polarized to hopefully match a level of polarization appropriate on average for 185 condensed-phase simulations [32][33][34]. This is true for common force fields in the AMBER, CHARMM, GROMOS and OPLS families 186 (e.g. GAFF [35,36], OpenFF [37], CGenFF [38][39][40], and OPLS [41,42]); these neglect polarization for computational efficiency. It's 187 possible that the approximations made by these fixed-charge force fields may result in particularly large errors in systems like 188 those examined here [43]. 189 Polarizability may also be particularly important for these systems due to the water model. Particularly, with fixed-charge 190 force fields, the water model is also non-polarizable, which may be an especially bad approximation for systems like these where 191 water interactions with a buried hydrophobic cavity are at play [43]. The expectation is that binding in host-guest systems like 192 those examined here are heavily influenced by the hydrophobic effect, and the hydrophobic effect will certainly be strongly 193 influenced by properties like polarizability. 194 Fixed point charge water models are limited in some ways by their use of the same partial charges to empirically fit the 195 potential energy landscape and dipole moment, two distinct water properties [44,45]. Inevitably, the choice in water model 196 (many listed in [46,47]) may also dictate the accuracy in (a) solvation, (b) dielectric constant, and (c) dipole moment [44], and 197 affect ionic behavior along with many other properties. Previous work in the Gilson lab indicated that even fixed-charge water 198 models can vary dramatically in water placement and orientation around hosts as well as in thermodynamic properties like the 199 enthalpy of binding [48,49], and it seems likely that polarizable models may exhibit even larger differences. 200 Polarizable force fields potentially help address some of these concerns and challenges. PIPF (polarizable intermolecular 201 potential function) and AMOEBA were among the first polarizable force fields developed, and have been in development since 202 the 1990s [46]. Polarizable force fields, and their importance for such systems, are explained in Section 2. In addition, popular 203 general force fields such as AMBER, OPLS, GROMOS, and CHARMM are continuously evolving and polarizable versions of some 204 of these are available [46]. One example of the latter is a recent release of CHARMM's balanced Drude polarizable force field [31]. 205 However, polarizable force fields have been applied relatively seldom in SAMPL challenges; AMOEBA was used in some prior 206 host-guest challenges [11], but the Drude polarizable force field has yet to be used in a SAMPL challenge. 207 In other words, polarizable force fields add additional complexity to the physical model used in describing these systems, 208 potentially providing additional accuracy but with additional computational cost. However, for some host-guest systems, this 209 may be particularly important for several physical reasons. First, these systems often exhibit strong electrostatic interactions 210 in a buried, relatively hydrophobic environment, meaning that the precise degree of polarization and environmental shielding 211 may be a key determinant. Polarizability may affect the strength of charge-charge interactions, and may strongly modulate the 212 shielding effect of the environment. Additionally, the hydrophobic effect can be a key determinant of binding, and this is also 213 likely strongly modulated by polarization of the water and host. 214 Polarizable force fields have shown some promise in prior SAMPL challenges. In the SAMPL6 host-guest challenge, a method 215 using the AMOEBA force field was employed on CB8 with 14 guests ranging from small organic molecules to larger drug-like 216 compounds, including approved drugs. The initial results had an ME and RMSE of 2.63 and 3.62 kcal/mol respectively, and 217 interestingly, this method was able to correctly identify questionable host-guest complex ratios of CB8 with guests 11 and 12 [11]. 218 The correct respective ratios for these systems were 1:1 and 1:2, and these were a bonus challenge in SAMPL6. It was found 219 that there was significant overestimation of guests 2 and 3 (Palonosetron and Quinine) and were presumed to be due to (a) 220 AMOEBA parameters for the host resulted in single and/or double indentation of the macrocycle and (b) conformers of flexible 221 guests locked during solvation in water vs binding in solvated complex [11]. In their subsequent studies, revised AMOEBA results 222 reported improved ME and RMSE of 1.20 and 1.68 kcal/mol respectively, though this was after challenge results were released. 223 In total 8 of the 15 predicted free energies were within 0.65 kcal/mol of experiment while the predictions for Palonosetron and 224 Quinine guests were in better agreement with experiments. The improvements were attributed to two factors: (a) the value of 225 key torsion parameters for C(N)-C-amide N-carbonyl carbons of CB8 and CB7 were adjusted to improve the flexibility description 226 of the host ring system and (b) a double annihilation scheme of electrostatics and van der Waals with annihilation of key guest 227 torsions yielded much better conformational sampling and hence predictive accuracy. However, through the SAMPL6 challenge 228 we had yet seen AMOEBA dramatically outperform other methods prospectively.

Differences in system treatment could also be important in some cases
Bruce Gibb's labs. There have been several analogs of these two families since host-guest systems first appeared in SAMPL3.

276
SAMPL7 includes several analogs in the cyclodextrin [60] family thanks to Michael Gilson's lab. 277 Study of these various systems, in SAMPL and elsewhere, can help provide insight into the particular challenges each system 278 presents. However, conclusions are not always clear; sometimes, performance remains highly variable across several challenges. 279 Particularly, performance in prior SAMPL challenges was highly variable by method and target, and no clear method emerged 280 as reliable across all systems or most systems. Both SAMPL3 and SAMPL4 included some guests in cucurbituril family [15, 64], with the best RMS errors typically being around 2.5 kcal/mol unless empirical corrections were included [61,65], and no 282 method stood out across both challenges [17]. SAMPL4 also included cavitands. In SAMPL5, the best RMS error was closer to 3 283 kcal/mol [61], but correlation with experiment for this approach was not good. Methods based on explicit solvent and electronic 284 structure calculations were noted to appear relatively consistent and generally provide the greatest reliability across all SAMPL 285 challenges [66], but also had considerable room for improvement. In general, predictions for cavitands seemed to be modestly 286 more accurate whereas clip-based hosts have been more challenging in prior challenges (like CBClip in SAMPL5 [66]). Thus, in 287 the present challenge, we hoped to learn whether we might see a method or methods with significantly improved accuracy 288 relative to prior challenges, and whether one might emerge that performs reasonably well (e.g. RMS error under 3 kcal/mol) 289 across multiple host classes, as this has not typically been the case in prior challenges.

291
The SAMPL7 host-guest challenge involved three different systems or categories which we explain here -one focusing on 292 cucurbituril-derivatives, one focusing on Gibb deep cavity cavitands (GDCCs), and one focusing on modified cyclodextrins.

294
Cucubiturils are a common and relatively well-studied system for host-guest binding [9] which have been featured in some prior 295 SAMPL challenges.

296
Many cucurbiturils (CB[n]s) have been synthesized by the Isaacs Lab, and several featured in previous SAMPL challenges. The 297 potential applications of cucurbiturils include use as solublizing excipients for insoluble drugs, sequestrants for drugs of abuse 298 and neuromuscular blockers, and pH triggered delivery agents [58]. This family of hosts typically have a molecular structure 299 containing n glycoluril units connected via 2n methylene bridges, forming a barrel shaped macrocycle with a central hydrophobic 300 cavity. In addition, cucurbiturils contain electrostatic carbonyls protruding out from the hydrophobic cavity.

301
In the SAMPL7 challenge, the host is not a classic cucurbituril, as instead of being a macrocycle, it is a clip-shaped molecule 302 based on similar chemistry. Particularly, the host is an acyclic cucurbituril clip composed of a glycoluril trimer capped with 303 aromatic triptycene sidewalls at both ends (here called TrimerTrip, as it is a trimer of glycoluril units with triptycenes), and 304 four sulfonate solubilizing groups protruding out from the sidewalls ( Figure 1) [58]. The sulfonate groups also enhance ion-ion 305 interactions with cationic guests [67], which are typical cucurbituril binders. Acyclic CB[n]-type receptors often take on a C-shape 306 due to their increase in flexibility [58,67,68]. Experimentalists synthesized acyclic cucurbiturils with the idea to help increase 307 the binding strength and capacity for different guests, including macrocyclic guests.

308
Typically, CB[n]-guest complexes have very high affinity, especially for charged hydrophobic ammonium guests similar to 309 those of the SAMPL7 challenge ( Figure 1). This high affinity is due to the presence of intracavity waters lacking a full complement 310 of hydrogen bonds. The lack of hydrogen bonds is known to provide an enthalpic driving force for binding to macrocyclic CB [ lead to convergence problems. (2) The timescales of wetting and dewetting events may be large compared to typical simulation 319 timescales. In CB7, when gradually decoupling a guest there is a large fluctuation of waters in the host cavity. The latter occurs 320 when the guest is partially decoupled and may also lead to convergence problems.

323
Previous studies of cucurbiturils, including CB7, have highlighted the importance of host and guest sampling, salt effects, 324 and water model. Sampling of the CB7 host is thought to be straightforward because it is fairly rigid. However, guest binding 325 modes might be challenging to adequately sample, especially for the more flexible guests. In the presence of buffer and/or 326 Figure 1. Structures of the TrimerTrip host and guest molecules for the SAMPL7 Host-Guest Blind Challenge. The acyclic CB[n]-type receptor, TrimerTrip, is shown on the top. It is composed of a glycoluril trimer with aromatic triptycene sidewalls at both ends, and four sulfonate groups to increase its solubility. The host takes on a C-shape and binds guests inside the cavity. The guests for the SAMPL7 challenge have the characteristics of typical CB[n] binders. The guests are grouped here, with the aliphatic chains on the left, and the cyclic and aromatics on the right. While TrimerTrip is a distinct host, it shares substantial similarity with these previous receptors and we expect it to exhibit rela-341 tively similar behaviors in binding to guests.  The difference between the two hosts is the location of 4 carboxylates around the cavity opening. For OA the carboxylates are 350 protruding out of the cavity while for exo-OA they are at the cavity entrance ( Figure 2).

351
GDCCs have been used in SAMPL3-7 and there is much experimental data [9, 43, 59, 72] and insight available. This family of 352 hosts bind guests with a hydrophobic moiety that fits the pocket and a hydrophilic group which points out towards the solvent [9]. 353 The GDCCs have been shown to bind diverse guests varying in polarity, positively and negatively charged, as well as organic  While the carboxylates protrude outward away from the cavity in OA, in exoOA they are at the rim of the cavity opening. The guests for SAMPL7 are named g1 -g8, with four guests with a carboxylate group, and four with a quaternary ammonium group. For the OA host, guests g1 -g6 have binding free energies which were previously reported and thus calculation of values was made optional for participants. host and/or guest. Acidic guests could be protonated, or two of the propionate groups could retain an acidic proton because 377 they are in close proximity and can hydrogen bond. At the rim of the cavity a guest may also modulate protonation state of the 378 neighboring carboxylates.  While typical SAMPL host-guest challenges have focused on binding of a series of guests to one or two hosts, one unique 387 aspect of this portion of the challenge is that it focuses on binding of just two guests to a series of related hosts. 388 Previous studies on CDs ( -CD, -CD, and mono-3-carboxyproponamido-CD) report two distinct bound states for each host-389 guest pair. The first bound state, called the "primary orientation", has the guest polar group (i.e., alcohol, ammonium, carboxy- GAFF v2.1 better models the flexibility of -CD compared to the SMIRNOOFF99Frosst and GAFF v1.7 force fields also examined.

396
The guests studied here have been reported to bind native -CD, mono-3-carboxyproponamido--CD, and -CD substituted

404
The SAMPL7 host-guest blind challenge was organized so participants may submit a ranked submission, a non-ranked submis-405 sion, or both for any or all of the three host-guest systems. Participants were advised to submit their best method as their 406 ranked submission since only one ranked submission is allowed, as detailed below.

407
Participants were provided with pre-prepared host and guest structures, with SMILES strings, mol2, PDB and sdf files pro-408 vided for all compounds. We made an effort to provide reasonable protonation states, etc., but also provided disclaimers that 409 Figure 4. CD host structures describing primary and secondary guest binding orientations. CD and its derivatives are known to bind guests in two orientations, primary and secondary. The primary binding orientation is when an asymmetric guests polar head group projects out towards the glucose primary alcohol or the larger opening (up). The secondary binding orientation is when a guests polar head group projects towards the secondary alcohol or the smaller opening (down). participants should carefully consider the choice of protonation state, etc. All provided data/instructions are available in the 410 SAMPL7 GitHub repository (https://github.com/samplchallenges/SAMPL7).

411
Participant submissions followed a prescribed template and included predicted values and uncertainties, as well as method 412 and participant information and other details, All submission files are available in the GitHub repository. Predicted values were 413 optionally allowed to include binding enthalpy.

414
Only ranked submissions were considered in challenge analysis. Groups were able to submit multiple submissions, but 415 needed to designate additional submissions as non-ranked. Non-ranked submissions, or additional submissions, allow "bench-416 marking" of methods. For example, for a particular method a participant can change one parameter in their methodology (i.e. 417 charging method, host conformer, guest pose, water model, etc.) to ascertain its impact on predictions. In previous challenges, 418 participants were allowed multiple ranked submissions; the shift to a single ranked submission per participant is new to SAMPL7.

419
This change was made to reduce the potential for multiple shots on goal to be more fair to groups which only submit one set of  As noted above, we provided input files in a variety of formats. Participants were advised that (a) further equilibration of 430 the host with the guest might or might not be needed (for TrimerTrip, we pre-equilibrated the host structure as discussed in 431 Methods) and (b) to exercise their best judgment on the state modeled (i.e protonation, conformer, binding mode, etc.). In 432 essence, part of the host-guest challenge for some systems included binding mode prediction.  In this section we give details of our own reference calculations. These reference calculations were informally part of the chal-440 lenge and used as additional methods for comparison. These calculations were also conducted blindly and were informally 441 submitted as a "non-ranked" category, as they do not constitute a formal part of the challenge but are provided as a point of Initially, test simulations were done with the goal to determine if we could identify and apply a reasonable single protocol 449 to run all host-guest systems. However, due to the guest formal charges and the diversity of the hosts and guests we guessed 450 that successful protocols (especially lambda spacings) would be system dependent. For the simulations, harmonic distance 451 restraints were used to allow the guest to explore the cavity and different modes since the binding mode of some guests were 452 unknown. We ended up choosing two reasonable protocols, varying in number of lambda windows (with all other simulation 453 parameters kept consistent), with one being for systems with neutral guests and a second for guests with a formal charge. The 454 protocol for neutral guests had 31 lambda windows and was based on a previous protocol used on -CD with cyclopentanol 455 as the guest. This protocol was tested on -CD with 4-methyl-cyclohexanol as the guest. For systems with a charged guest, 456 we ran a test free energy calculation using YANK's automatic pipeline to determine the best alchemical path (lambda windows  The "neutral guest" protocol described above (31 lambda windows) was used to run all simulations in the cyclodextrin dataset 464 with guest g1, for 16 ns per lambda window when free energy estimates appeared converged. On the other hand, the "charged 465 guest" protocol (61 lambda windows) was used for the remaining host-guest systems across all datasets since all other guests 466 bore a formal charge. In this case simulations, were run until free estimates apparently converged or up to 30 ns per lambda 467 window, which ever came first. First, to determine feasible cross application of the "charged guest" protocol to different systems 468 (GDCC and TrimerTrip datasets), the charged protocol was tested on OA-g2 and clip-g11. Experimental data for OA-g2 was 469 available from a previous SAMPL challenge, so this was an ideal system to test the protocol. The OA-g2 test resulted in predicted 470 free energy within 4 kcal/mol, after running the simulation to 26 ns per window. A health report for the OA-g2 simulation showed 471 reasonable mixing between replicas, and there was apparent convergence. However sampling of replicas in individual states  Reference calculations were conducted using GAFF parameters and AM1-BCC charges. GAFF parameters and guest AM1-483 BCC charges were assigned using Antechamber, and AM1-BCC charges for the host were assigned using the OpenEye toolkits 484 because Antechamber could not charge the hosts. The starting poses were determined by docking via AutoDock/Vina and the 485 top scoring pose was selected. A host-guest complex was manually created in tLeap and TIP3P was used to solvate the host-guest   All quantities are reported as point estimate ± statistical error from the ITC data fitting procedure. The upper bound (1%) was used for errors reported to be < 1%. We also included a 3% relative uncertainty in the titrant concentration assuming the stoichiometry coefficient to be fitted to the ITC data [1] for the Isaacs (TrimerTrip) and Gilson (cyclodextrin derivatives) datasets, where concentration error had not been factored in to the original error estimates. For the OA/exo-OA sets, provided uncertainties already included concentration error. In some cases, exoOA-g1 binding constants were not detected (ND) by ITC or H NMR. Binding of guest g2 to exoOA was very weak so only H NMR spectroscopy could produce reliable free energy data. and 6 non-ranked (Table 5). For a large portion of methods submitted, docking was used to obtain starting structures. General 533 classical fixed charge force fields with no explicit polarization treatment were commonly used, as has become common in SAMPL 534 host-guest challenges. Most simulation-based methods used explicit solvent, and used the TIP3P, TIP4PEw, and OPC water mod-535 els. For this challenge one method did explicitly treat electronic polarizability. In addition, quantum mechanical methods were 536 used in 2 of the datasets (GDCC and Cyclodextrin). Alchemical free energy techniques were employed in many cases, with anal- and respectively (Figure 6). The mean error (ME) for this AMOEBA submission was modestly larger in magnitude than one of the 553 other ranked submissions, but in all other respects its performance was superior. Full statistics are in Table 3. AMOEBA-based 554 approaches also perform well in the GDCC category, as we will see below.

555
Here, the AMOEBA/DDM/BAR method predicted 10/16 binding affinities within 2 kcal/mol, the majority of these being within 556 1 kcal/mol (as discussed in the SAMPL7 virtual workshop [43]; full data available in our GitHub repository). The outliers for 557 this method were clip-g6, clip-g7, clip-g8, clip-g9, clip-g11, and clip-g17, of which binding affinities were predicted to be too 558 unfavorable. The FSDAM/GAFF2/OPC3 method predicted 10/16 within 2 kcal/mol and host-guest system outliers were clip-g3, clip-  We sought to determine whether some hosts/guests are particularly challenging to predict, across all ranked methods, so we 571 examined the RMSE and ME by host and guest for ranked free energy predictions for all individual host-guest systems. This is 572 shown in Figure 8. The ranked predictions of all methods for the TrimerTrip/"clip" host-guest systems (shown in blue in Figure 8) 573 were in general the most problematic, especially clip-g6, clip-g9, clip-g10, clip-g11, clip-g18, and clip-g19 which had an RMSE 574 of about 4 kcal/mol or greater. All of the guests with an adamantane moiety fall within this list of "problematic" molecules.

575
The computed binding affinities for these host-guest systems are mostly too weak with ΔG ME of -2.5 kcal/mol or greater, the 576 exception being clip-g10 which was predicted to be too favorable with a ΔG ME of 2 kcal/mol.  The GDCC dataset, which includes OA and exo-OA host-guest systems, had the most submissions, probably because this host is and 0.83 respectively. Essentially, the latter approach seems to have done slightly better at ranking compounds for binding than 590 the AMOEBA-based approach, but with a slope which is systematically incorrect. Full performance statistics are in Table 2. The Ponder group's data suggests that the quality of torsional parameters for the upper rim's diphenyl ether torsions can 613 change predictions by 3 -4 kcal/mol. In our reference calculations, we observe this guest folding in on itself and becoming 614 effectively bulkier, which may mean host torsional parameters play a larger role for this particular guest.

615
On the other hand, the B2PLYPD3/SMD-QZ-R quantum method had larger prediction errors for guests with a positive charge.

617
Similarly, for the OA-g7 system which contains a positive guest, the method had a ΔG prediction error of 5 kcal/mol. These  The former had the slope closest to 1 and its RMS error was among the lowest, whereas the latter performed better on error and correlation metrics but had a lower slope which was systematically off. (See Table 3)   some will likely be more strained/less populated than others) and do not relax back on simulation timescales.

647
To address these issues, the Ponder group used a separate set of free energy calculations to compute the binding free en- Overall, TrimerTrip predictions using the AMOEBA force field were consistently the best.  (Tables 3 and 4).

665
The reference method gives free energies for all TrimerTrip host-guest complexes which are too unfavorable, similar to ranked  There were 11 non-ranked submissions for the GDCC dataset in addition to the 4 ranked predictions (Table 3) (Table 3). The non-ranked 688 Table 3. Error metrics for all (ranked and non-ranked) SAMPL7 methods for all host-guest systems. The root mean square error (RMSE), mean absolute error (MAE), signed mean error (ME), coefficient of correlation (R 2 ), slope (m), and Kendall's rank correlation coefficient (Tau) were computed via bootstrapping with replacement. Shown are results for individual host categories, as well as the combined OA and exoOA dataset. Statistics do not include optional host-guest systems OA-g1, OA-g2, OA-g3 OA-g4, OA-g5, OA-g6, bCD-g1, and bCD-g2. provide the greatest opportunity for the community to learn.  to rimantadine (g2), though the exact reason for this is not known. The SAMPL7 host-guest blind challenge provided a platform to test the reliability of computational methods and tools to accu-783 rately predict binding free energies. Since hosts in the cucurbituril and cavitand families have been featured in previous SAMPL 784 challenges (and likely in future challenges) these provide a mechanism to assess how the field progresses across a series of 785 challenges. In addition, the amount of attention these have received helps us identify some potential lessons learned and give 786 suggestions for improvement.

787
The TrimerTrip dataset of SAMPL7, like cucurbiturils from previous challenges, posed the largest challenge for participants, as 788 judged by method performance. Specifically, most methods performed poorly at computing binding free energies for cationic 789 guests with cyclic, aromatic, and adamantane based moieties. In addition, most methods were relatively inconsistent at pre-790 dicting binding free energies of hydrocarbon chains of increasing length, but the AMOEBA methods did very well predicting 791 7 of 8 within 2 kcal/mol. Still, two methods performed relatively well even here, with both using alchemical free energy cal-792 culations. Predictions from the best fixed-charge force field submission, based on nonequilibrium free energy calculations 793 (FSDAM/GAFF2/OPC3), had errors above 2 kcal/mol for 8/16 host-guest systems considered. In contrast, performance with the 794 AMOEBA polarizable force field was significantly better here, suggesting that one key source of error may be polarization effects.

795
In the TrimerTrip case, participants also found evidence that binding free energies may be more accurate if different potential

807
In part because of the relatively extensive prior work on GDCCs, some submissions applied empirical corrections before mak-808 ing predictions, and/or utilized machine learning approaches. These tended to help performance, here, but rely on availability 809 of training data on closely related systems -which is not always available for prospective applications.

810
On the GDCCs, as for TrimerTrip, submissions using the AMOEBA force field performed particularly well. Additionally, along 811 with a QM based method, AMOEBA correctly predicted exoOA with g1 a non-bonder. Perhaps only AMOEBA and QM methods 812 capture relevant polarization effects well enough to accurately describe this particular complex well in general, though one 813 MM/PBSA approach also recognized this as a nonbinder.

814
For the current challenge, the AMOEBA method had the most consistent performance across the different host-guest com-815 plexes, and across datasets (TrimerTrip, OctaAcid, exoOA). Despite the lower variation for this method, guest g4 was particularly 816 sensitive to diphenyl ether torsional parameters which worked very well in all other GDCC systems.

817
The cyclodextrin derivatives were new to SAMPL, and many methods achieved relatively low RMS errors -though this is likely 818 in part due to the low dynamic range of the set. This low dynamic range also meant that correlation metrics were typically poor.

819
There were no AMOEBA submissions for this aspect of SAMPL7, but the force field used in this dataset still apparently played 820 a role computing accurate binding free energies, with GAFF2 seemingly giving more accurate results followed by CGenFF, and 821 GAFF. The performance of methods for the cyclodextrin dataset varied across host-guest systems, but predicting reliable binding 822 free energies for cylcodextrins with large sidechains to rimantadine was frequently challenging. were apparently more difficult.

827
In terms of overall lessons learned in this challenge, we found that methods which only varied a single factor (such as force 828 field or water model, with a fixed method) were particularly valuable in terms of providing insight into accuracy, thus we urge 829 participants to continue with such explorations in the future. Another important area of work is to ensure that methods which 830 ought to be equivalent do, in fact, give equivalent results across different simulation packages. [5].

831
Overall, SAMPL7 showed marked progress in binding prediction relative to previous challenges, and in particular results

832
with the AMOEBA force field were particularly promising for two of the challenge components. For future challenges it will be 833 interesting to continue investigations of host/guest sampling, polarization effects, and possibly salt behavior in similar systems.

834
We look forward to continuing to work with the community to use the SAMPL challenge to drive accuracy improvements in 835 binding predictions. Therapeutics.