Toward Simple, Predictive Understanding of Protein-Ligand Interactions: Electronic Structure Calculations Join Forces with the Chemist’s Intuition

Contemporary efforts for modeling protein-ligand interactions entail a painful tradeoff – as reliable information on both noncovalent binding factors and the dynamic behavior of a protein-ligand complex is often beyond practical limits. In the following paper, we demonstrate that information drawn exclusively from static molecular structures can be used for the semi-quantitative prediction of experimentally-measured binding affinities for protein-ligand complexes. In the particular case considered here, inhibition constants (Ki) were calculated for eight different ligands of torpedo californica acetylcholinesterase using a simple, multiple-linearregression-based model. The latter, incorporating five informative and independent variables – drawn from QM cluster, DLPNO-CCSD(T) calculations and LED analyses on the eight complexes – is shown to recover no-lessthan 96% of the sum of squares for measured Ki values, and used to predict the inhibition potential for yet another ligand (E20, for which no Ki values are available in the literature). This thus challenges the widespread assumption that “static pictures” are inadequate for predicting reactivity properties of flexible and dynamic protein-ligand systems. ABSTRACT Contemporary efforts for modeling protein-ligand interactions entail a painful tradeoff – as reliable information on both noncovalent binding factors and the dynamic behavior of a protein-ligand complex is often beyond practical limits. In the following paper, we demonstrate that information drawn exclusively from static molecular structures can be used for the semi-quantitative prediction of experimentally-measured binding affinities for protein-ligand complexes. In the particular case considered here, inhibition constants (K i ) were calculated for eight different ligands of torpedo californica acetylcholinesterase using a simple, multiple-linear-regression-based model. The latter, incorporating five informative and independent variables – drawn from QM cluster, DLPNO-CCSD(T) calculations and LED analyses on the eight complexes – is shown to recover no-less-than 96% of the sum of squares for measured K i values, and used to predict the inhibition potential for yet another ligand (E20, for which no K i values are available in the literature). This thus challenges the widespread assumption that “static pictures” are inadequate for predicting reactivity properties of flexible and dynamic protein-ligand systems.


Introduction
Protein-ligand (PL) interactions have drawn great amounts of scientific attention throughout the last century (see Refs. [1][2][3][4] for a few recent textbooks and reviews). Besides being examined for playing crucial roles in a variety of essential biochemical processes, such interactions are often focused on in many drug design studies -revolving around finding inhibitors for proteins such as enzymes and neuroreceptors for the purpose of invoking a desirable biological response. [5][6][7][8] Due to such considerations, many researchers from a broad spectrum of scientific disciplines (consisting of computational biologists and biochemists as well as theorists from chemistry and physics) have attempted to provide some general theoretical/computational modeling schemes for predicting biochemically-relevant PL binding events. [9][10][11][12][13] It has been well-established that PL systems are greatly influenced by noncovalent interactions (NCIs). [14][15][16][17][18][19] The latter, resulting from subtle electronic effects, are very small in magnitude and cannot virtually be measured by experimental means. Thus, ab initio electronic structure methods constitute a precious (and almost exclusive) source of information on biochemically-relevant NCIs -which, in turn, often used for the parametrization and calibration of more approximate computational modeling techniques (such as DFT functionals and molecular mechanics force fields). [20][21][22][23] Ideally, one could use such nonempirical electronic structure methods for running molecular dynamics (MD) simulations on realistic PL systems; in such scenario, the information drawn from such simulations would include an adequate description of biochemically-significant NCIs, and it can thus be expected to offer desirable predictive power (which is, after all, the main goal of any theoretical model). However, electronic structure calculations are notorious for their steep computational cost scaling with the system's size -which generally precludes using them for MD simulations on realistically-sized biochemical systems (excluding a few recent approximate approaches, each entailing different methodological challenges; see, for instance, Refs. 24,25 ).
For this reason, and since some description of NCIs relevant for PL binding is clearly crucial for predictive purposes, 26,27 electronic structure calculations are mostly combined with additional computational techniques used for describing the dynamic, continuous relationship between PL pairs that leads to biochemically-significant (active-site) binding. In this manner, electronic structure calculations are performed on static structures, which are assumed to represent crucial events in the PL binding process (see Ref. 28 for a recent, comprehensive 3 review). It is generally assumed, for instance, that the actual biochemically-significant binding event -taking place in the protein's active site -must incorporate some description of noncovalent binding factors. Thus, one common piece of information on PL interactions provided by electronic structure methods corresponds to the PL binding energy -calculated as the energetic difference between the bound PL structure and its underlying protein and ligand structures found at infinite separation (Eq. 1): Δ" #$%& = " )* − (" ) + " * ) [1] Were P and L stand for protein and ligand, respectively (in their complex-structure geometry). It should be pointed out that the relationship between such calculated energetics and realistic PL systems is quite unclear (as said, PL binding is a continuous, dynamic process; representing it using such "binary" means -i.e., bound complex vs. free structuresclearly ignores this fact); still, quite a few authors have employed such quantities as bits and pieces of information in more-general predictive theoretical/computational schemes -where additional such pieces, obtained using different techniques (e.g., classical MD trajectories), are also used. [29][30][31][32][33][34] Needless to say, such multi-method efforts require an appropriate multimethod-expertise from the researcher, and entail lots of (perhaps undetectable) sources of error and technical difficulties -as demonstrated in Figure 1.
When interested in predictive modeling of PL systems, we are therefore faced with a painful dilemma ( Figure 2). An appropriate description on biochemically-relevant NCIs is, on the one hand, required; the dynamic relationship between PL pairs cannot, on the other hand, be ignored; holding on to one source of information and letting go of the other would make our inquiry simple and elegant, but wrong and unreliable; trying to hold on to both complicates things further, as reasonable interfaces between different kinds of information must be established -giving rise to many corresponding sources of error that cannot necessarily be assessed.
4 Figure 1. A hypothetical, "conventional" molecular-modeling-based ligand identification process, employing quantum chemistry methods. Compare with Figure 5, which illustrates our approach as proposed in the present paper (Acronyms: QC = quantum chemistry, MD = molecular dynamics, QM/MM = hybrid quantum chemistry -classical molecular mechanics methods. Some crucial problems threatening the process' success are outlined in red).
Motivation: finding potential inhibitors for, e.g., some enzyme of interest EZ Identify a candidate inhibitor α -assumed to "behave" similarly to known ligands. Sources of information: chemical intuition/docking techniques/QSAR methods...

Stage 1. Establishing a Pool of Candidates
Confirm/refute the existence of a stable α-EZ complex: Scan the potential energy hypersurface for such complex using QC methods à locate local/global minima, representing bound structures Use classical MD simulations for sampling potentially-bound structures (e.g., where α is within binding distance to active amino acid residues) à investigate further using QC or QM/MM methods.

Stage 2. Candidate Verification
Compare α's binding affinity (K i ) with that of known ligands (L n ). But how can K i be assessed? Let X and Y be the binding energies for the L 1 -EZ and α-EZ complexes, respectively. If X<Y à K i of α is larger, and vice versa.
Binding affinities often reflect much more information than calculated binding energies! (Lock-key model: many dynamic factors should also be taken into account) Run classical MD simulations for both ligands à come up with adhoc measures for K i based on resulting trajectories No explicit treatment of quantum mechanical electronic behavior, so no way of knowing whether binding forces are appropriately described! The main aim of this paper is demonstrating that this dilemma may often be avoided -by using electronic structure calculations on static molecular structures that also provide some important information on the dynamic nature of PL binding processes. In such manner, it should be possible to avoid using MD simulations altogether and still arrive at valuable predictive models -which may guide future experiments and drug discovery studies. Being mainly interested in utilizing the information offered by electronic-structure methods, and not in specific state-of the-art data analysis and modeling techniques, we will limit our discussion to a very simple predictive model type -based solely on multiple linear regression (MLR).
The latter, incorporating independent variables from the aforementioned electronic structure calculations, will be used for a semi-quantitative prediction of experimentally-measured inhibition constants, or Ki values (which are ubiquitously used as a practical measure for binding affinities, and compared across different ligands as a relative, realistic biochemical reactivity potential with respect to a specific target protein). [35][36][37][38][39][40] Our assumptions, in this context, may be summarized as follows: (a) Active-site binding corresponds to a critical event in the inhibition process; that is, inhibition cannot occur in the absence of such event.
(b) A combination of independent energetic components derived from a sufficientlyaccurate description (which includes noncovalent binding factors) of this binding event is characteristic to a given ligand's isomeric structure and chemical composition. That is, a significant change in the latter would result in qualitatively-different such components.
(c) Individual local-energy-decomposition (LED; see computational methods section) contributions exhibit well-defined intermolecular distance dependence; 41

Computational Methods
All geometries used in this work were obtained in the following manner: 1. Nine crystal structures of torpedo californica acetylcholinesterase (Tc AChE), each containing a different bound ligand (a.k.a inhibitor) in its active site, were drawn from the PDB website (see corresponding research papers in Refs. [35][36][37][38][39][40]42 ;).
2. "Potentially-active" amino acid residues, defined to be found within 3 Å from any atom in the ligand structure (thus being capable of significantly-interacting with the latter; see, for instance, section 2.2 in Ref. 43 ) were selected via 'CONTACT' calculations included in the CCP4 suite. 44 3. Residues found in the preceding stage for each crystal structure were then simply taken alongside the corresponding bound ligand to create the final active-site + ligand geometries used throughout this paper. Electronic structure calculations were then performed exclusively on the resulting structures, and thus correspond to "QM cluster" calculations -according to the taxonomy used in Ref. 28 All calculated data considered in this paper were obtained using DLPNO-CCSD(T) calculations and subsequent LED analyses included in the ORCA 4.2 program package; 41,45 "NormalPNO" settings, 46 as well as the def2-SVP basis set, 47 were used for all of the latter.
Thus, all data were drawn from LED outputs in the following manner: • DLPNO-CCSD(T)/SVP inter-fragment binding energies were drawn from the "Sum of INTER-fragment total energies" entry, found in the "INTER-vs INTRA-FRAGMENT TOTAL ENERGIES (Eh)" section in the LED outputs. As a sanity check, we verified that binding energies derived from subtracting the sum of "Intrafragment total energies" from the "total energy" for a given PL complex (both found in the same section in LED output) produce identical energetic values -as shown in the ESI. Note that different definitions for "binding energies" can be found in the literature (some actually correspond to the "interaction energies" mentioned in the introduction); in our case, the term simply corresponds to the difference in total energies between the super-system and its underlying protein and ligand fragments Note that whereas the nonempirical DLPNO-CCSD(T) method and LED approach are used for generating the data considered in this paper -other methods (such as those based on a 8 perturbation theory formalism) may generally be used for similar purposes. [48][49][50][51] It should also be mentioned that the above basis set and PNO domains may rightfully be considered inadequate for quantitatively-accurate electronic structure calculations (resulting in energetics found within 1 kcal/mol from a reliable reference level) of noncovalent interactions in vacuum. 46 That being said, it should be stressed that accurate calculation of NCI energetics should not be recognized as one of the main goals of the current paper. Instead, we will focus on using the very basic information derived from LED calculations for predictive purposes.
The semi-quantitative manifestation of this purpose, as we shall show below, is independent of such extreme quantitative accuracy.
Multiple linear regression was carried out using the "Analysis Toolpak" add-in for Microsoft Excel 2018 (Macintosh version); 95% confidence intervals were consistently employed for all resulting models. For reproduction purposes, all relevant geometries and ORCA input files used for this paper are provided in the ESI, alongside a dedicated spreadsheet containing our raw and calculated data.
We would like to suggest that our methodological choices and considerations may be of particular interest on their own (and not just as means for achieving the main, stated goal of this paper); the interested reader may browse through associated methodological discussions, questions and answers -all provided in Appendix B: Methodological Meditations.

Results and Discussion
Experimentally-measured inhibition constants [expressed as log(Ki)], and the calculated data for all ligands under consideration (i.e., inter-fragment binding energies and corresponding LED contributions, all given in kcal/mol), are provided in Table 1. 9 Clearly, drawing predictive conclusions from this data using nothing but the naked eye is no easy task -as there is no clear one-to-one correspondence between any of the calculated variables and the experimentally-measured quantities. Let us therefore resort to an MLRbased picture -from which, as will be shown below, some interesting conclusions may be drawn.
For illustration purposes, a plot of experimental Ki values is given in  Table 2) as well as by the similarity between the resulting predicted curve and that of (A).   Let us now consider a third model (M3) -incorporating both binding energies and LED data employed in M1 and M2, respectively. As shown in Table 2 and Figure 3.D, this model clearly has striking predictive power -as it recovers no-less-than 96% of the SSQ for the experimentally-measured Ki values. Residual errors are also much smaller, and their distribution narrower, compared to M1-2: the single-largest error amounts to 0.476 log(Ki), which almost matches the very smallest, qualitatively-significant errors found for the previous models. What this all means is that the totality of information corresponding to both overall binding strength and specific NCIs for each of the ligands may be used for reproducing the experimentally-measured Ki curve in a semi-quantitative manner.
Indeed, some of the particular binding affinities predicted by M3 are, to some extent, overestimated/underestimated (note DZ0/CP0/DZ7). Nevertheless, despite not having ideal predictive value -this simple model may clearly provide useful comparative estimates, corresponding to realistic inhibition potentials for different ligands.
As a basic demonstration, let us use the data calculated for yet another ligand, E20 (taken from the 1EVE structure, see Ref. 42 ) -which is also given in Table 1 Table 2). processes. This statement clearly has substantial implications on contemporary chemistry knowledge -and we hope it will be of service in future scientific efforts concerning such interactions. As mentioned in the introduction, the general idea which underlies our current approach (illustrated in Figure 5) is, by no means, new. Quite a few great chemists have attempted to conduct similar arguments (see Appendix A), but they seem to have lacked the appropriate technical means needed for establishing solid, data-based conclusions. Luckily, state-of-the-art quantum chemistry methods have provided us with the required missing pieces of information -and thus bringing the corresponding puzzle of scientific knowledge to its completion has finally become possible.
It is, perhaps, an appropriate time to remind the reader that we do not, on a general principle, recommend using simplistic MLR-based models such as above for practical investigations of PL systems. What we did mean to demonstrate is that some particular static molecular structures, described using nonempirical electronic structure methods, clearly possess enough information for modeling biochemically-significant PL interactions. We can only hope that this basic insight will, eventually, be combined with more elaborate and robust modeling techniques. As a side note, we would like to mention that our above results, methodological considerations and assumptions may be of interest for several additional reasons (which had not been discussed in preceding sections): [a] physical meaning of LED contributions is different than that of "realistic" NCIs -which do not necessarily exhibit a well-defined dependence on the intermolecular distance; in addition, the relationship between such calculated components and the total binding energy is nontrivial; [b] using and validating MLR models for confirming the very informativeness of predictive variables is a fundamentally different task than establishing statistically-robust models for practical 14 applications -although the two may easily be confused. Thus, we hereby encourage the reader to browse through Appendix B, where such matters are discussed in appropriate length. Figure 5. An "alternate ending" to the process presented in Figure 1, making use of our own energydecomposition-analysis-based approach as outlined above (Acronyms: PL = protein-ligand, NCI = noncovalent interactions).

Summary and Conclusions
Based on our above investigation of the Tc AChE enzyme and associated ligands/inhibitors, the following conclusions may concisely be summarized: • We have seen that informative, static molecular structures -corresponding to bound protein-ligand complexescan be used for a semi-quantitative prediction of the corresponding, experimentally-measured Ki values. Such successful prediction is by no means trivial, due to the fact that binding affinities are assumed to result from a large variety of dynamic factors affecting the continuous PL binding process.
• Multiple-linear-regression-based models incorporating either inter-fragment binding energies or LED components calculated for the bound PL structures do not possess desirable predictive power -as they cover only 50% and 81% of the sum of squares Compare α's binding affinity (K i ) with that of known ligands (L n ). But how can K i be assessed?
Employ each of the former contributions as an independent variable; fit the model to experimentally-measured binding affinities for L n .

Stage 3. Evaluating the Candidate's Inhibition Potential
3.1 Run appropriate electronic structure calculations on α and L n :

Use M to Predict K i (α):
Predicted binding affinity would reflect the electronic, NCI-based correspondence between the protein and different ligand candidates.

Our Energy-Decomposition-Analysis-Based Approach:
. . . • In contrast, a model employing both binding energies and LED components does offer surprisingly-useful predictive capabilities, covering no less than 96% of the sum of squares for the experimentally-measured values. Additionally, the residual error distribution is significantly narrower than those of the above alternatives; the largest such error, in this case, amounts to the smallest significant ones found for former models.
• The above model may easily be used for predicting Ki values which have not yet been experimentally-measured. In such manner, the realistic inhibition potential for the E20 ligand was assessed to be greater than those of DZ7 and E12, but smaller than that of NHG.
• The statistical significance of calculated binding energies and LED components cannot be exclusively attributed to the number of independent parameters and corresponding fitting coefficients used in each model (Appendix B, Q2). Thus, our calculated data clearly has inherent predictive value.
• Despite the fact that LED components do not represent physically-realistic noncovalent interactions (arising from subtle, dynamic electronic effects), they do incorporate highly-valuable information on the latter (Appendix B, Q1). Such information may be combined with additional data (in our case, calculated binding energies) for the purpose of predicting realistic chemical quantities.
Our above conclusions may also be intuitively-understood using the classic "lock-key" analogy 53 for PL binding: overall active-site binding energetics may be considered to provide some information on a given keyhole's "size", while PL complex-specific NCIs (represented by specific LED contributions) incorporate information on its corresponding "shape".
Whereas an entire lock's mechanism cannot simply be inferred from its keyhole's properties -focusing on the latter may often suffice for practical predictive purposes.
(Interestingly enough, the "lock-key" analogy had later been "replaced" by an alternative, "glove-hand" version 54 -which seemed to demonstrate some important aspects of PL binding processes which were "ignored" by its predecessor. That, however, clearly entailed simplifying/neglecting other such aspects. In any case, analogies of this sort are merely used for facilitating intuitive understanding and should not be taken too literally.) 16 As a final remark, we would like to express our hopes and great anticipation for additional efforts concentrated on supplying predictive scientific explanations based on chemical intuition. The latter, which may be seen as one of the most prominent achievements of modern science, has apparently not been fully utilized by means of currently-available scientific methods and techniques. We can only hope to contribute to such efforts and witness the science of chemistry as it reaches new and exciting heights.

Acknowledgments
All active-site geometries used in this work were extracted from .pdb files by Dr.

Supporting information
All geometries and ORCA input files used in this work are provided in the ESI, alongside a dedicated spreadsheet containing our raw and calculated data. 20

Appendix A: Static Solutions to Dynamic Problems
The current paper presents an attempt for drawing information on dynamic PL binding processes from informative, "static pictures". The latter are assumed to represent critical events, which must occur as part of biochemically-significant PL binding (see introduction).
It might be important to try and understand why one should, in fact, even expect such "static pictures" to be adequate for describing any dynamic process in the first place. Classical Eyring transition state theory (TST) has, in fact, provided chemists with this very possibility (i.e., reducing dynamic problems to static ones). 57 It allowed, for the first time, the prediction of reaction rates -which reflect dynamic information on "collisions" between molecular species -by means of reduction to static, well-defined molecular structures, corresponding to critical events along the reaction path (i.e., the forming of a reactive transition state). Indeed, the theory has been shown to offer great predictive power, despite lacking strict physical rigor in its original form (a fact which did not seem to particularly concern Eyring himself; later, more physically-rigorous versions of the theory are reviewed in Ref. 55 ).
The truth is, however, that chemists have been used to solving problems this way long before Eyring has published his theory: it may be argued that predicting chemical properties based on, e.g., relevant Lewis/Kekulé structures is founded on similar logic. Obviously, Kekulé structures of organic molecules cannot be expected to represent how the latter "look" in a dynamic environment: the electron density for a given molecule clearly undergoes many changes with respect to time, to the point where it is not even trivial to speak of a welldefined molecular structure that is maintained throughout dynamic processes (see, for instance, discussions in Refs. 58,59 ). However, some static structures do interest us because they somehow provide simple and elegant, practical answers for questions of great 21 importance. Such static structures are therefore of more interest than others, and their identity depends on particular questions we are interested in answering. In the context of chemical reactivity, for instance, energetic minima and maxima may indeed be of particular interest.
Other static structures may allow us to answer questions of different kinds (think, for instance, of solubility; energetic minima/maxima are often less-important than energeticallyinvariant structural features, such as ones corresponding to an "ability" of forming hydrogen bonds). Thus, chemistry may generally be perceived as the science which reduces dynamic molecular processes to static, informative molecular structures for the purpose of gaining desirable predictive power ( Figure 1A). Some thoughts regarding harnessing TST to PL systems have been later expressed by Pauling. 60 Trying to rationalize the catalytic "benefit" of enzymatic reactions, he claimed that the latter increase the relative concentration of reactive transition state species compared to corresponding enzyme-free reactions. Regardless of the practical implications for such claim, the motivation behind it may be understood based on the above discussion of TST: instead of bothering ourselves with a vast amount of information associated with the entire PL complex, we may focus on specific features associated with a single reactive structure. Thus, if we wish to quantify a given enzyme's catalytic efficiency, all we need to do is compare the relative concentrations of such species in both enzyme/enzyme-free scenarios. Needless to say, the assumption according to which enzymatic reactions take place via single, welldefined reaction paths may indeed be exposed to criticism; however, the same chemicalintuition-driven motivation recognized in the very foundation of TST is again seen to offer 22 great explanatory value. Its practical utility, however, is somewhat difficult to assess -due to the fact that finding reactive species of interest using current computational techniques is often impractical.
Since Pauling's above claim had emerged, a few related attempts -based on similar argumentation in different practical "packaging" -have also been published. 56,[61][62][63][64] Still, searching for reactive transition-state structures, telling the "main story" of continuous PL processes, had continued to be the main bottleneck limiting the latter's applicability. That being said, the fact that practical applications of TST to PL systems have met with many difficulties does not mean that a static representation of such systems, having desirable predictive power, cannot be established. Proving that such possibility exists was, in fact, the main motivation for the present paper. Note that our work corresponds, in a sense, to an inverse version of TST -since we use bound PL complex structures (i.e., energetic minima) as a source of information on dynamic processes. Indeed, it turns out that these structures simply provide enough information for answering practical questions of interest (such as ones requiring comparative assessments of binding affinities), and that -from our point of viewplainly stands for yet another useful manifestation of chemical intuition.

Appendix B: Methodological Meditations
In order to facilitate the readers of this text, we hereby provide representative answers for important questions that may seem to arise from it. Both questions and answers have resulted, in fact, from actual discussions with colleagues from diverse scientific backgrounds.

Q1. LED vs. Physical Realism
• LED contributions do not represent realistic NCIs -which may not exhibit a welldefined intermolecular distance dependence (true dispersion, for instance, cannot be said to have the R -6 dependence as Edisp; see Ref. 65 and references therein). Besides, considering that no consistent level of theory was used for obtaining all calculated LED contributions -the latter cannot be summed to reproduce the "original" interfragment binding energy values. If that is not enough, geometries drawn from crystal structures do not represent true energetic minima on well-defined potential energy surfaces for the PL complexes at hand. Thus, the data used in M1-3 consist of 23 approximate quantities that are not rooted in quantum mechanics. How can it then be considered as a reliable source of predictive information?
Indeed, the data used in M1-3 may, perhaps, represent an approximation to the "true" NCIs taking place between each ligand and Tc AChE. That being said, and putting aside the fact that simple regression statistics confirm the informativeness of our predictors (see Q2 below), it is important to note that we had no intention of committing our scientific worldview to quantum mechanics alone. To us, electronic structure calculations are simply a tool for establishing general conclusions -aimed to benefit the science of chemistry. Thus, the use of approximate quantities (which were not derived from a fundamental, consistent physical model -embodied in a single, well-defined electronic structure method) does not harm any of the conclusions drawn in this paper (recall that the main question considered here is: is it possible to reduce PL binding to static chemical structures?).
On a different note, it should be emphasized that despite their seemingly-approximate nature, LED contributions do incorporate "high-resolution" information on NCIs -as they are derived exclusively from the nonempirical electronic structure calculated for a given system, using a well-defined protocol (in this context, see Cioslowski & Surján's "generalized observability criterion" in Ref. 66 ). Thus, and despite the fact they are obtained using different levels of theory, the latter may still be calculated and compared across different systems for the purpose of arriving at desirable conclusions -as we have chosen to do in this work.
In this context, what really matters is how the above data are being used, and the purposes for which they are used for. Since we have used total IEs and LED contributions as independent variables in MLR-based models, all we care about is comparing their corresponding values across all PL systems under consideration -as opposed to considering different energetic contributions calculated for a single PL complex. For this reason, the strict physical rigor associated with the LED approach is not of our current concern -contrary to its predictiveinformative potential. At last, it should actually be mentioned that the fact that the LED contributions do not fully add up to their corresponding binding energies does represent a practical advantage; it allows us to avoid intervariable linear-dependence issues that would prevent us from successfully applying MLR in the first place.
As previously mentioned in the introduction, all geometries considered in this work simply correspond to informative static structures (hypothetical energetic minima, perhaps?), that can be used for predicting experimental binding affinities (that is, to offer useful and practical understanding of PL systems). Indeed, finding these structures by means of well-converged, computational geometry optimizations would probably prove to be difficult (for reasons similar to those responsible for the impracticalness of Pauling's idea, as mentioned in Appendix A). However, considering the true nature of information and methods used in this paper, it is possible to conclude that doing so should not even be necessary: the crystal structures used by us, which result from statistical averaging of many individual complex structures and clearly do not correspond to such well-defined minima -have provided us with the information we are interested in for achieving our goals. Following a similar logic, previously-unknown informative molecular structures may practically be found through, e.g., a combination of semi-converged geometry optimizations and simple chemical intuition. The question whether a specific structure truly corresponds to a physically-realistic energetic minimum should not also not be of particular concern -as long as it is capable of providing an informative estimate for a specific PL binding energy, as well as for particular NCIs involved in the binding process.
Q2. Causality vs. "Kitchen-Sink Regression" • In principle, any data may be fitted and used as predictors for reproducing the experimental Ki curve (Figure 3 It should be noted that it might just happen that some randomly-generated numbers will prove to reflect predictive information after such fitting process (a fact often incorporated into machine learning techniques 68  First of all, it is worth mentioning that we are not claiming that simple models such as M1-3 can be expected to provide useful predictions for just any PL system; while we do believe that our approach (outlined in both main text and appendices) should generally be considered for predictive purposes -we certainly suspect that the particular models considered in this work might be inadequate for some PL cases.
Still, some cases which correspond to reason (a) may also, perhaps, be statistically-reduced to specific NCIs between PL pairs (as binding specificity to a particular ligand must incorporate some correspondence of this sort). Similarly, reason (b) would indeed push for dynamic modeling as long as considered effects are expected to qualitatively change the very fundamental PL NCI-based correspondence; if that is not the case -there would be no necessary reason to go beyond the static NCI-based picture. Thus, our approach should, in principle, incorporate the above factors -as long as the PL binding process is dominated by the aforementioned NCI-based correspondence. This would clearly not be true for all possible PL binding processes -as may be illustrated using the following simple thought-experiment.
Consider, for instance, a protein that introduces some "random" component into the continuous binding process: o Case #0: Ongoing conformational changes, for instance, may "hide/reveal" the binding pocket independently from any relationship with a particular ligand. Such random component, however, should not -in principle -affect any prediction of relative binding affinities (as it is not ligand-specific and should affect any binding process).
o Case #1: Taking an additional step forward, it is possible to think of a protein which introduces some random and ligand specific such component. In this case, the bound PL structure may not offer enough information for useful predictions -and additional sources of information would then have to be considered.
We would certainly expect models M1-3 to fail in the latter case. Nevertheless, additional variables (obtained using electronic-structure calculations, or other sources of information) may also be used for arriving at useful predictive capabilities -and so our very general approach should not be suspected for having inherent, system-specific shortcomings.

Q4. Relationship between the current work and QSAR approaches
• Your MLR-based model seems similar to ones used in quantitative-structure-activityrelationships (QSAR) studies. Can you explain how your approach differs from previously-proposed QSAR schemes?
Indeed, it is possible to argue that some QSAR models are based upon a "philosophy" similar to ours; that is, chemical predictions are offered based on calculated structural features that are not assumed to remain invariant throughout a molecular process of interest. However, approaches of this sort are, to the best of our knowledge, inherently different than the one used in this work: 69 1. QSAR descriptors (a.k.a independent variables) are not often derived exclusively from electronic structure calculations on a bound complex structure. Thus, relationships between all predictors and response variable are not rooted in a consistent, uniform or intuitively-understandable physical picture.
2. QSAR descriptors are not explicitly chosen to reflect 'critical events' in particular molecular processes. In fact, particular QSAR descriptors are often chosen based on the statistical robustness of the resulting model alone, while the added information corresponding to these predictors is not explicitly discussed.
The fact that fitting techniques such as MLR are used for constructing predictive QSAR schemes may certainly suggest some similarities between the latter and the simple models (M1-3) used in our work; however, our emphasis lies on electronic-structures-and chemical-28 intuition-based predictor selection. For all it is worth, it should be possible to say that we have chosen to implement our perception of chemistry in a QSAR envelope. The main idea, however, on drawing predictive information on complex molecular systems from static electronic-structure calculations is independent of such particular implementation and should be considered as the main message of our text.

Q5. Implications on Chemical Bonding
• You attempt to statistically-reduce a continuous, dynamic biochemical process to information drawn from selected static "electron-density snapshots" of isolated molecular species. It is possible to argue that the very notion of a chemical bond is also based on somewhat similar statistical reductions. Should your current effort actually be considered as an attempt for generalizing bonding concepts to PL systems?
Despite not being quite aware of this point in the beginning of our investigations, our above reasoning and inferences do resemble predictive arguments involving notions of chemical bonding. In a sense, chemical bonds do not exist in a dynamic molecular system (as any definition of a "bond" would lie on a static electron-density picture). Still, this concept helps us chemists to arrive at astonishingly-useful practical predictions (of, e.g., reactivity, solubility, and the like -depending on the particular questions considered -as mentioned in