Insight into the Binding Energetics of Targeted Reversible Covalent In- hibitors of the SARS-CoV-2 Main Protease

The main protease (M) of the SARS-CoV-2 virus is an attractive therapeutic target for developing antivirals to combat COVID-19. M is essential for the replication cycle of the SARS-CoV-2 virus, so inhibiting M blocks a vital piece of the cell replication machinery of the virus. A promising strategy to disrupt the viral replication cycle is to design inhibitors that bind to the active site cysteine (Cys145) of the M. Cysteine targeted covalent inhibitors are gaining traction in drug discovery owing to the benefits of improved potency and extended drug-target engagement. An interesting aspect of these inhibitors is that they can be chemically tuned to form a covalent, but reversible bond, with their targets of interest. Several small-molecule cysteine-targeting covalent inhibitors of the M have been discovered— some of which are currently undergoing evaluation in early phase human clinical trials. Understanding the binding energetics of these inhibitors could provide new insights to facilitate the design of potential drug candidates against COVID-19. Motivated by this, we employed rigorous absolute binding free energy calculations and hybrid quantum mechanical/molecular mechanical (QM/MM) calculations to estimate the energetics of binding of some promising reversible covalent inhibitors of the M. We find that the inclusion of enhanced sampling techniques such as replica-exchange algorithm in binding free energy calculations can improve the convergence of predicted non-covalent binding free energy estimates of inhibitors binding to the M target. In addition, our results indicate that binding free energy calculations coupled with multiscale simulations can be a useful approach to employ in ranking covalent inhibitors to their targets. This approach may be valuable in prioritizing and refining covalent inhibitor compounds for lead discovery efforts against COVID-19 and future coronavirus infections.


Introduction
Despite the recent success of multiple vaccines in the prevention of COVID-19, there is still an urgent need to develop targeted therapeutics for the treatment of COVID-19. This is imperative given the emergence of rapidly spreading new variants of the SARS-CoV-2 virus 1,2 -the causative agent of COVID-19, which threatens to compromise vaccine efficacy. To date, the nucleoside analog Veklury (remdesivir) remains the only antiviral drug approved by the U.S. Food and Drug Administration for treatment of patients with COVID-19 requiring hospitalization-although no meaningful mortality benefit to patients has been shown. 3 This presents a serious unmet medical need in COVID-19 treatment given the significant global impact of the disease.
Among the group of SARS-CoV-2 protein targets, 4 the main protease (M pro ) -also known as 3-chymotrypsin-like protease (3CL pro ), has emerged as an attractive target for the development of antivirals to combat COVID-19. 5,6 M pro is a highly conserved protein that is essential for the viral replication cycle. M pro has unique substrate specificity and cleaves the viral polyprotein at 11 conserved sites for viral replication and transcription. 7 There are no human homologs of the M pro with the same specificity, so inhibiting M pro offers reduced risk for off-target reactivity and represents a suitable therapeutic strategy in the development of pan-coronavirus drugs.
Several small-molecule inhibitors of the SARS-CoV-2 M pro have been discovered, 8-17 with a few progressing into early phase human clinical trials. [18][19][20][21] The fundamental aim of these inhibitors is to disrupt the viral life cycle. One therapeutic strategy is to design inhibitors with electrophilic functional groups (a.k.a., "warhead") that bind to the nucleophilic active site cysteine (Cys145) of the M pro target. Cys145 of the M pro is essential for catalytic activity, so blockage or modification of this residue is detrimental to the virus. This strategy of targeted covalent modification of druggable sites in disease targets is gaining traction in drug discovery, 22-27 as evidenced by the successful number of approved drugs employing this approach (e.g., ibrutinib). 28 Importantly, covalent inhibitors offer superior potencies, prolonged duration of therapeutic action, and high target selectivity than their non-covalent drug binding counterparts. Also, since such covalent binding events can be chemically tuned to be reversible, 29 concerns regarding the risk of potential off-target reactivity and toxicity can be mitigated while maintaining potency and efficacy.
Understanding the binding energetics and interactions of inhibitors with their targets offers insight at the molecular level to facilitate drug design and discovery. Over 1000 high-resolution crystallographic structures of SARS-CoV-2 proteins have been determined, 30 which has enabled structure-guided drug design efforts towards the development of potent compounds as therapeutic leads in the fight against COVID-19. In silico methods such as virtual screening, molecular dynamics, and binding free energy calculations have provided structural insight into the binding interactions, molecular mechanism, and conformational dynamics of inhibitors binding to the M pro target to inform COVID-19 drug discovery. 31 In our own work, we have showed that alchemical absolute binding free energy (ABFE) calculations coupled with multiscale quantum chemical calculations can predict the binding energies of covalent inhibitors to the M pro target reasonably well. 32 Building upon our previous work, we employ rigorous binding free energy calculations to explore the binding energetics of reversible covalent inhibitors of the SARS-CoV-2 M pro target. Reversible covalent inhibitors form covalent but reversible bonds with their target enzyme (Figure 1(a)) and are considered safer alternatives to conventional irreversible covalent inhibitors due to less risk for idiosyncratic toxicities. The inhibitors studied include 11a, 12 GC373, 14 MI-30, 16 and PF-00835231 ('PF-231'), 17 Figure 1(b). These inhibitors contain either an aldehyde or ketone functional group as electrophilic warhead and can reversibly bind to the catalytic Cys145 of the M pro , 14,17 forming thiohemiacetal/thiohemiketal tetrahedral complexes. To calculate the absolute binding free energies of the inhibitors to the M pro target, we employed alchemical free energy calculations and ONIOM QM/MM calculations on model protein-ligand complexes of the system. The alchemical ABFE calculations were performed using both non-enhanced and enhanced replica-exchange sampling techniques within the equilibrium free energy framework. 33 Our results indicate that enhanced free energy sampling methods via the Hamiltonian replica-exchange algorithm (H-REMD) 34 improved the convergence and standard deviation of the binding free energy predictions. These results provide valuable insight into the binding energetics of reversible covalent inhibitors to the M pro target and highlights an important case where enhanced sampling techniques can improve the convergence of computed free energy estimates. This has the potential to assist in the rational design and evaluation of novel chemical entities, as well as support decisions in lead optimization campaigns towards identifying potent antiviral leads for current and future coronavirus diseases.

Methods
System Setup: Model structures of the protein-ligand complexes were downloaded from the Protein Data Bank. 35 The PDB accession codes for the complexes studied include 6LZE, 12 7D1M, 15 7D3I, 16 and 6XHM, 17 for 11a, GC373, MI-30, and PF-231, respectively. The covalent bond between the warhead of the inhibitor and active-site cysteine (Cys145) of the protein were broken prior to running the molecular dynamics (MD) simulations. The M pro amino acid residues were assigned protonation states corresponding to their default values at neutral pH. We have previously evaluated methods for the calculation of the pKa of titratable cysteine residues in proteins 36 and have performed MD simulations to estimate the protonation state of the catalytic Cys145/His41 dyad of the M pro . 32 Based on our results and the work of others, 37-39 we treated the catalytic Cys145/His41 dyad of the M pro as neutral residues. We also modeled His41 residue in two tautomeric states (i.e., N-and N-protonated states) to explore the preferred protonation state. The protonation state of histidines in SARS-CoV-2 M pro , in particular His41, has been shown to impact ligand binding and protein structure dynamics. 40 The GROMACS MD software 41 (version 2020.4) was used to perform the calculations. Model structures of the protein-ligand complexes were prepared in explicit solvent using the TIP3P 42 water model. The AMBER99SB-ILDNP 43 and GAFF 44 force fields were used to generate parameters for the protein and ligands, respectively. Ligand charges were parameterized using AM1-BCC. 45,46 The model systems were solvated in a dodecahedral box with a maximum cutoff distance of 12 Å from the edge of the box. The simulation cell was charge neutralized with an appropriate number of Na + ions. Following this, energy minimization and 2-ns equilibration runs (using both NVT and NpT ensembles) were performed on the system. This was subsequently followed by 10-ns simulation of the complex. The MD simulations were performed to determine the preferred protonation state of His41 in the model complexes prior to running the binding free energy simulations.
Absolute Binding Free Energy Calculation: We performed ABFE calculations to compute the non-covalent binding free energy contribution of the inhibitors to the M pro target. ABFE calculations, although computationally expensive, provide a theoretically rigorous framework 47 for estimating ligand binding energies and have been shown to yield results that agree well with experimental binding energies. [48][49][50][51][52] For performing the ABFE calculations, we followed the protocol reported in our recent publication 32 on estimating the total binding energy of peptidomimetic inhibitors of the SARS-CoV-2 M pro . Two sets of simulations for each model system were performed for the binding energy calculations, consisting of the ligand in bulk solution and in the protein-binding site. A set of restraints defined by one distance, two angles, and three dihedral harmonic potentials were imposed on the bound ligand to keep it in the protein-binding site following the decoupling steps of the free energy calculation (see Supporting Information for details). For the ligand in bulk solution, the restraint term following the decoupling event is computed analytically using the method proposed by Boresch et al. 53 Similar to our previous protocol, 32 31 windows were used for the ligand in bulk solution and 42 windows were used for the protein-ligand simulation. For each window, a series of simulations consisting of 100,000-step steepest descent energy minimization, 1ns of NVT equilibration and 1 ns of NPT equilibration were performed. A timestep of 2 fs was used for the simulations with reference temperature set to 298.15 K. The LINCS constraint algorithm was used to constrain H-bonds. The Particle Mesh Ewald (PME) algorith 54,55 was used to treat long-range electrostatic interactions and a cut-off distance of 12 Å was used for short-range electrostatics.
Following the equilibration simulations, a series of 12 ns production runs were performed for each window under isothermal-isobaric ensemble conditions using the Parrinello-Rahman pressure coupling scheme. 56 The simulations were performed with and without enhanced sampling using the Hamiltonian replica-exchange MD (H-REMD) approach. All simulations were performed in triplicate to estimate the uncertainty in the calculations and to check for convergence in the results. For each protein-ligand model system, an aggregate total of ≃ 5.5 µs in sampling time was achieved. The results of the simulations were analyzed using the alchemical_analysis.py python tool, 57 and were calculated using multistate Bennet acceptance ratio (MBAR) 58 method after discarding the first 2 ns from each window as equilibration. The reported binding free energy results are the averages and standard deviation of the three independent repeats.
ONIOM(QM:MM) Methodology: To estimate the energy leading to the formation of the covalent bond between the target Cys145 of the M pro and the reactive center of the inhibitors, we performed QM/MM calculations on the complexes using the ONIOM 59 approach implemented in Gaussian 16. 60 Explicit solvent water molecules were excluded from the model system. For the QM/MM ONIOM model, we partitioned the system into two layers: (1) a High-level QM region consisting of the full inhibitor structure, Cys145 and His41 side chains (2) Low-level molecular mechanical region which accounts for the remaining residues of the system. On average, our high-level QM region consisted of 74-82 atoms, while the low-level MM region contained 4599-4649 atoms. We treated the high-level region using the M06-2X functional with the def2-TZVP basis set-which performs well for modeling the thermochemistry 61 and reaction energies 62,63 of chemical reactions. The lowlevel region on the other hand, was treated with the amber molecular mechanical force field. Missing ligand atom parameters were obtained from the generalized Amber force field (GAFF). 64 Hydrogen link atoms were used to cap the QM-MM boundary region. Figure  2 shows a representative snapshot of the atoms comprising the QM region for a selected model system. The ONIOM model structures were geometry optimized using the M06-2X/def2-TZVP:AMBER method within an electronic embedding scheme. We also performed frequency analyses on the optimized geometries using the same level of theory to verify that the optimized structures were minima and to calculate the Gibbs energy.

His41 Protonation State
The active-site of the SARS-CoV-2 M pro consists of catalytic dyad residues (His41/Cys145) which are responsible for the proteolytic activity and function of the protease. 65 Among the residues involved in the enzymatic activity of the M pro , His41 is suggested to act as a proton carrier of the thiol hydrogen of Cys145 66,67 -enabling nucleophilic attack on the reactive center of the inhibitor by the activated Cys145 thiolate formed. Recent work by Gumbart and coworkers have shown that the protonation state of His41 of the M pro is sensitive to the presence of a ligand, 40 which can influence the structural stability and dynamics of the M pro complex. Therefore, for each of the protein-ligand complexes studied, we investigated the preferred protonation state for His41 to inform our binding energy simulations.
Prior to running the binding free energy calculations, we performed 10-ns MD simulations on the model complexes to determine the preferred protonation state of His41. The protonation states of His41 in the model structures were assigned either as HID (N ! protonated) or HIE (N " protonated), according to amber force field naming convention. All the other histidines were left unaltered in their default protonation states as chosen by the pdb2gmx program of GROMACS MD engine (see Supporting Information). The difference in structure and binding interactions of the inhibitors with the M pro target in the structures were analyzed upon detailed visual inspection of their contacts. Also, the root-mean-square deviation, or RMSD, between select corresponding atoms ( -carbon of the protein and heavy atoms of the ligand) in the simulated complexes and the crystallographic starting structures were measured after alignment of the structures. Our results indicate that the inhibitorbound complexes of 11a, GC373, and MI-30 favour the HID states of His41, while the PF-231 inhibitor-bound complex favours the HIE state of His41. Figure 3 depicts snapshots of the structural environment of the catalytic dyad configurations in the M pro target complexes taken from the equilibrated MD simulation.
The distance between the carbonyl carbon of the inhibitors and the cysteine S atom is around 3 Å for all the equilibrated inhibitor-bound M pro complexes (Figure 3). For both 11a and GC373 M pro complexes, the thiol side chain of Cys145 forms a direct hydrogen bonding interaction with the N " of His41 ( Fig.3; Top). On the other hand, for MI-30 and PF-231 M pro complexes, we observe a hydrogen bond between Cys145 thiol side chain and a nearby water molecule ( Fig.3; Bottom). In fact, studies have shown than both His41 and conserved catalytic water molecules in the protein active site play an active role in the reaction mechanism of the M pro . 38,66-70 As a minor caveat, only the protonation states of His41 was explored in our model complexes due to its significant role in the binding mechanism. 65 Binding free energy simulations that can model the dynamic variation in residue protonation and tautomeric states will be useful to explore the influence of other titratable residues-although not the focus of this study.

Non-covalent Binding Free Energy Contribution
The non-covalent binding free energy contribution of the inhibitors to the M pro target was computed using alchemical absolute binding free energy calculations. Alchemical ABFE calculations provide a direct way of computing ligand binding energies within a rigorous statistical thermodynamic framework. 47 This approach employs a nonphysical thermodynamic cycle to compute inhibitor binding free energies and considers the conformational entropy and dynamics of both the protein target and inhibitor upon binding. 47,71,72 ABFE calculations can yield highly accurate binding affinities for ligands that agree very well with experiment. [48][49][50][51][52]73 Recently, we employed this approach to estimate the binding affinity of the peptidomimetic inhibitors N3 and α-ketoamide of the SARS-CoV-2 M pro . 32 In this work, we calculated the absolute binding free energies of the inhibitors to the M pro target using both non-enhanced and enhanced sampling techniques within the equilibrium ABFE framework. For the enhanced ABFE simulations, a Hamiltonian replica -exchange molecular dynamics (H-REMD) algorithm 34 was employed to facilitate sampling convergence in the free energy estimates. Table 1 provides a summary of the binding free energy results with and without H-REMD for the complexes studied in this work.
The results show that for each inhibitor-M pro pair, both equilibrium approaches yield relatively similar results for the binding free energy estimates. However, the uncertainty in the computed binding free energy estimates is larger for the non-enhanced equilibrium ABFE approach (no H-REMD) relative to the enhanced sampling approach (H-REMD). This result is consistent with previous studies that have showed that the inclusion of the replica exchange algorithm in alchemical free energy simulations greatly improves the conformational sampling and convergence of free energy estimates. 73,74   Our binding free energy results indicate that the non-covalent binding free energy contributions of the inhibitors to the M pro target is greatest for the PF-231 inhibitor ( Table 1). This is followed by the MI-30 inhibitor, which has a predicted binding energy of ~7.5 kcal mol -1 to the M pro . The inhibitors 11a and GC373 were predicted to have lower binding affinities in comparison to PF-231 and MI-30. Importantly, the predicted binding free energy results we report represent the non-covalent interaction energies between the inhibitors and the target protein. We were unable to find experimental inhibition constants (Ki) for all the inhibitors to directly compare with our predicted binding free energy results. However, comparison to experimental IC50 values for the inhibitors suggest that our predicted non-covalent binding free energy results correspond well with the relative potency rankings of the inhibitors to the M pro (Figure 4),although we note that IC50's in general are not suitable descriptors for estimating the biochemical potency of covalent inhibitors. [75][76][77] This is because IC50's for covalent inhibitors will vary depending on the incubation time of the assay and do not provide any information about the affinity or chemical reactivity of a covalent inhibitor.  An example we highlight is Glu166, whose backbone amino acid groups and carboxylate side chain establishes multiple hydrogen bonding contacts with functional groups of the inhibitors (Figure 5, right panel). Other hydrogen bonding residues include Gly143, His164, Cys145, Gln189, and Ser144. In addition to these hydrogen bond interactions, there are van der Waals and hydrophobic interactions between the M pro residues and the inhibitors. Together, these interactions between the target protein and inhibitors contribute to the binding free energies of the inhibitors to the M pro .

Covalent Binding Free Energy Contribution
Covalent inhibitors typically follow a two-step kinetic mechanism characterized by an equilibrium dissociation constant (Ki) for the non-covalent E:I complex and an inactivation rate constant (kinact) to form the covalent E-I complex, Figure 1.(a). Reversible covalent inhibitors differ from irreversible inhibitors in that the step leading to the formation of the final covalent E-I complex is not permanent.
Modeling the binding of covalent inhibitors with their targets requires characterization of both the non-covalent and covalent binding free energy steps. 32,[79][80][81][82] Multiscale QM/MM calculations are useful in this regard since they allow chemical processes to be modeled-particularly, the chemical bond formation step between the target residue in the protein and an inhibitor. 62,83 To estimate the covalent binding free energy contribution of the M pro inhibitors studied, we performed ONIOM (QM:MM) calculations on the model systems. Using this approach, we calculated the relative Gibbs energy difference between the equilibrium non-covalent complex and the covalent complex. Our results for the covalent binding free energy of the inhibitors to the M pro target are summarized in Table 2. Table 2. Summary of the covalent binding free energy (∆ ) results of the inhibitor bound M pro complexes studied in this work.
M pro -MI-30 -9.8 M pro -PF-231 -6.8 The results indicate that the reaction leading to the formation of the covalent bond between Cys145 of the M pro and the inhibitors is moderately exergonic. For irreversible Michael acceptor inhibitors of the M pro , covalent binding reaction energies as high as -28 kcal/mol have been predicted. 84 We found that among the complexes studied, the reaction free energy for adduct formation is greatest for the MI-30 inhibitor bound complex (ΔGcovalent = -10.5 kcal/mol). The covalent binding free energy calculated for the other adducts are -6.0 kcal/mol, -8.3 kcal/mol and -6.8 kcal/mol, for the inhibitors 11a, GC373, and PF-231, respectively (Table 2). These results suggest that the inhibitors 11a and PF-231 engage more reversibly towards the M pro target than GC-373 and MI-30 inhibitors. The relative magnitude of the covalent binding free energy predicted for these inhibitors is in line with their reversible characteristics as observed by experiment. 14,17 In addition, our predicted covalent binding energy results for the M pro -PF231 and M pro -11a inhibitor bound adducts are in agreement with recent multiscale QM/MM studies that investigated the mechanism of covalent inhibition of the M pro target by these compounds. 69,85 Interestingly, we observe different covalent binding energies for the inhibitors 11a, GC373, and MI-30 which share a common warhead (i.e., an aldehyde functional group). This suggests that there are a range of factors beyond warhead type that can influence the binding energetics of reversible covalent inhibitors to their targets. 86 Knowledge of both covalent and noncovalent binding free energy contributions is essential in determining the overall binding energetics of covalent inhibitors to their enzyme targets.

Conclusion
In this study, we performed rigorous absolute binding free energy simulations and ONIOM QM/MM calculations to estimate-from a thermodynamical standpoint-the energetics of binding of some promising reversible covalent inhibitors of the SARS-CoV-2 M pro . The inhibitors studied include 11a, GC373, MI30, and PF-231. These inhibitors contain either an aldehyde or ketone moiety as warhead and form a covalent bond with the active site Cys145 of the M pro . Cys145 is essential for the catalytic activity of the M pro , so modification of this residue by a targeted covalent inhibitor will have deleterious effects on the function of the M pro and the virus as a whole.
Our computational approach involves the calculation of both the covalent and non-covalent binding free energy contributions of inhibitors to the M pro target. From an in silico perspective, understanding these two binding contributions is important for the efficient optimization and design of potent compounds. For estimating the noncovalent binding free energy contributions of the inhibitors to the M pro target, we employed equilibrium binding absolute free energy simulations using both non-enhanced and enhanced sampling methods. Our results indicate that enhanced sampling methods via the replica exchange algorithm yield binding free energy estimates that are better converged and have a lower standard deviation than the non-enhanced methods. Hydrogen bonding and dispersion forces are significant contributors to the non-covalent binding free energy of the inhibitors to the M pro target. Our calculation of the covalent binding free energy contribution of the inhibitors to the M pro suggest that the reaction is moderately exergonic, which is in line with the reversible character of these inhibitors as reported by experiment. We observe different covalent binding energies for M pro inhibitor compounds sharing a common aldehyde warhead group, suggesting that multiple factors other than warhead type influence the binding energetics of reversible covalent inhibitors.
Finally, we note that the present work has some caveats that needs pointing out. We only explored different protonation states of the catalytic His41 in our simulations due to its critical role in the molecular mechanism. But in principle, all histidine protonation states should have been explored and considered. Nevertheless, we find that binding free energy estimates of the inhibitors to the M pro target are impacted when different protonation/tautomeric states are chosen for the active site His41 residue. We caution selection of appropriate residue protonation and tautomeric states (in particular, for His) for ionizable residues when performing binding free energy studies of the M pro target. Methods that can account for the dynamic variation in titratable residue and ligand protonation states at different solution pH will be valuable in this regard, enabling complex dynamic transitions in protein-ligand binding events to be modelled more accurately.