SupraFit-An Open source Qt based fitting application to determine stability constants from titration experiments

Model AbstractTitrationModel AbstractItcModel Michaelis-Menten-ModelTitrationModel AbstractItcModel Michaelis-Menten-Model NMR/UV-VIS Models ITC Models Blank Model Monomolecular Kinetics legend green box: Qt class blue box : regular class red box : abstract class with pure virtual functions Fig. 2 Inheritance relationship in SupraFits model implementation. To implement a new model, a C++ class has to be derived from AbstractModel class and the most important virtual functions have to be implemented. ITC and NMR titration apart from confidence calculation.40–42 The application to calculate confidence intervals has been reported for titration experiments by Thordarson15 and in general by Motulsky and Christopoulos.43 The confidence intervals from Monte Carlo simulations are obtained using the percentile method, which has been discussed alongside with resampling methods by Efron.38 Efron noted, that the section dealing with confidence intervals ”is highly speculative in content.”38 The basic idea of the Monte Carlo approach is to theoretically repeat the performed experiment several times (T ). A single theoretic step is being realised by adding a random error εi to ycalc,i and then obtain a new set of data mimicking the original experimental data including realistic errors. These data can be used to estimate a new set θ . Performing these steps T times is denoted as Monte Carlo (MC) simulation within this context. Two main approaches to define the errors ε are implemented in SupraFit, (a) they are calculated from the standard normal distribution ε ∈ N(μ = 0,σ2) or (b) randomly chosen from the absolute errors obtained after the successful fit (ε ∈ e). The later will be called bootstrapping (BS) in SupraFit and may be interpreted as a mixture of a typical Monte Carlo simulation and resampling technique. Bootstrapping is one of the resampling plans discussed by Efron.38,44 More recent discussions and problems using the bootstrapping method can be found in Canty et al.45 and in Efron and Hastie.46 The applied standard deviation σMC in approach (a) can be taken from the SEy, σ f it or as manually defined value, where SEy is the default choice as proposed by Motulsky and Christopoulos since it is the corrected standard deviation (eq. 31). The 1− 2α confidence interval for each model parameter is then calculated using the percentile method: θi,− = ˆ CDF −1 (α) θi,+ = ˆ CDF −1 (1−α) (33) which results in the 95% confidence interval if α = 0.025. In SupraFit, this is realised by collecting all model parameters for each Monte Carlo step and then take α · T and (1−α) · T entry of the ordered list of the corresponding parameter. More advanced percentile methods, which are available in octave or R, are not implemented, so for a smaller number of T the results differ from those obtained with the standard approach using the quantile function in octave or R.¶ More robust methods will be implemented in future releases. Efron proposed 2000 steps as minimum for bootstrapping methods,46 which is taken as standard for all Monte Carlo simulations in conjunction with the percentile method. Since Monte Carlo simulations are parallelised,|| it benefits from the multicore architecture of modern desktop computers. Monte Carlo results are then reported as histogram-like charts as printed in Fig. 3. The box represents the 95% confidence interval, the dash-dotted line the estimated parameter. The individual bins are not plotted as typical bars but rather as a line-plot. Fig. 3 Standard representation of a histogram-like chart obtained after performing a Monte Carlo simulation. Alternatively to the variation of ycalc, Thordarson proposed the variation of input data, which are the initial concentrations of host and guest molecules in case of NMR titration.15 This derivation can be performed alongside with standard Monte Carlo simu¶ See https://octave.org/doc/v4.0.1/Descriptive-Statistics.html. Last visit 17.01.2022. || Monte Carlo simulation are spawned across the threads, that roughly each thread performs T/NT hreads optimisation.


Introduction
After the work of Pederson, Lehn and Cram in the second half of the 20th century (nobel prize in 1987 "for their development and use of molecules with structure-specific interactions of high selectivity"), supramolecular chemistry has become a popular field of research. The experimental determination of association constants utilising supramolecular titration experiments plays a big role in the analytical zoo of this research area. Several software packages have been written in the last three decades, each having its own strength and weaknesses. In times of open science, open data and open source software, some of these older software solutions might be considered as not state-of-the-art. The most recent tool for supramolecular titration experiments has been developed by the group of Thordarson and is available via www.supramolecular.org (last checked 17.01.2022) as online service. As a matter of taste, someone could prefer online tools to offline applications and vice verse. As an alternative to the older offline applications and as well as to the online tools, a newly written software package for supramolecular titration experiments called SupraFit is reported.
SupraFit is written in C++ utilising the Qt Software Development Toolkit 1 and the Eigen Library. 2 SupraFit is mainly developed for NMR titration and ITC experiments, providing methods to globally and locally analyse 1:1, 2:1/1:1 and 1:1/1:2 complexes out of the box. Fully statistical analysis based on Monte Carlo simulation and F-Test approaches with a good scaling on multicore systems are implemented as well as an intuitive user a Institut für Organische Chemie, Technische Universität Bergakademie Freiberg, Leipziger Strasse 29, Freiberg, Germany. E-mail: Conrad.Huebler@gmx.net † Electronic Supplementary Information (ESI) available: Selected images with higher resolution (Fig. S1 -S4). Input data for simulated 1:1/1:2 model. Selected images with higher resolution (Fig. S5). Calculated intersection in Mole Ratio plot (Table S1). Selected images with higher resolution (Fig. S6). Representation of boxplots from Monte Carlo simulation for NMR titration (Fig. S7 -S14). Comparison of initial guessed and least-squared estimated parameters in ITC experiments (Table S2). Isotherms and dilution experiments (Fig. S15). Estimated parameters and confidence interval with different dilution strategies (Table S3 - interface to deal with several models on single data sets. Due to being open source, own models can be implemented in the source code, with all functionality eg. statistical analysis being provided for the new models.

Software
Several packages already exist for the analysis of NMR titrations or ITC data, some of them did not receive updates or improvements recently. Additionally, these programs may provide statistical analysis, which are not always comparable to each other as they are based on different theories. A third point is the advantage of software to run on different operating systems (OS) or even being independent of an OS, although Windows systems dominate the PC market.
oped Sabatini, Vacca and coworkers, providing tools for different methods such as NMR titration, ITC and spectrophotometry. Hyp-NMR runs on Windows system and information on how to obtain the software are available upon request. ‡ The most recent version according to their website (http://www.hyperquad.co.uk/hypnmr.htm, last checked 17.01.2022) is HypNMR2018, without pointing out the differences to the older versions.
M. Maeder and P. King founded Jplus Consulting in 2009 and provide a software packages called ReactLab to analyse and simulate for example equilibrium titrations and kinetics. The software is based on a combination of MatLab and Excel and is available for purchase. More information can be found on their official web site: (https://jplusconsulting.com (last checked 17.01.2022).
Open Data Fit 11 is a collection of online services provided by P. Thordarson, where titration data can be analysed. The service can be accessed at http://opendatafit.org (last checked 17.01.2022). For now, supramolecular experiments 12 and a demo version for cell viability 13 are available. The kinetics service is under construction. 14 BindFit, the part focusing on supramolecular titration, was initially provided as free MatLab scripts included in the tutorial review by Thordarson 2011. 15 The latest version supports analysis of NMR and UV/VIS titration of typical 1:1, 2:1 and 1:2 systems, with the python source code being available at https://github.com/echus/supramolecular-apps (last checked 17.01.2022). New features such as Monte Carlo simulation based statistics are announced for future versions.

ITC
NanoAnalyze is available from TA Instruments, that assemble and sell instruments for several analysis (thermal, microcalorimetric and rheologic analysis). NanoAnalyze is freely available for Windows systems, provides several binding models, analysis of thermograms and statistics based on Monte Carlo simulations.
Harms et al. 16 released pytc (python itc) as open source software, built on top of python3 to analyse ITC data, having the most important binding models already implemented. The project is hosted on GitHub: https://github.com/harmslab/pytc (last checked 17.01.2022). Since it is written in python3, other models can easily be added. Statistical methods like F-Test or Information Criterion [17][18][19][20] methods are implemented and can be used to determine the performance of the models. An graphical user interface using PyQt5 can be downloaded separately at https://github.com/harmslab/pytc-gui (last checked 17.01.2022).
SEDFIT and SEDPHAT form a program package to globally analyse ITC data (gITC), with powerful statistical analysis based on Monte Carlo simulations or the F-Test approach. 21 It is freely available at https://sedfitsedphat.nibib.nih.gov/software/default.aspx (last checked ‡ See http://www.hyperquad.co.uk/index.htm for more information on how to obtain parts of the Hyperquad software products. 17.01.2022), however other systems apart from windows are not supported. Thermogram analysis can be performed with NITPIC (http://biophysics.swmed.edu/MBR/software.html last checked 17.01.2022) from Keller et al. 22 and then imported into SEDFIT.

Supramolecular Titrations
The theory of complexation and supramolecular titration is already reviewed in articles by Thordarson,15,23 as well as in text books like Analytical Methods in Supramolecular Chemistry 24 but the main aspects will be summarised here:

General Approach
Starting from the general mass balance equations (eq. 1 and 2) for a two-component system, the relationship between the concentration of two components [A] and [B] can be described through the cumulative stability constants (eq. 3). For example individual stability constants for a system with two complex species A a B b defined with a = b = 1 and a = 2, b = 1 read as in equation 4.
Depending on the values for l and m, e.g. the stoichiometry of molecules of A and B that are involved in forming the complex, different systems can be described. SupraFit reports all stability constants as individual logarithmic constants lgK, in contrast to other software that may report them as plain stability constants K in M −1 or as cumulative constants β .

Determining stability constants
The determination of association constants with titration experiments is based on the idea, that each component influences the response signal: Assuming a linear relationship between the amount of species and the response signal, equation 5 can be formed, where each component X i contributes to the overall signal y by a factor Y i .

NMR Titration
Upon performing 1 H-NMR titration, the chemical shift of specific protons bound to X (eg. receptor) changes during complexation due to non-covalent interactions with another component. Depending on the kinetics of the complex formation, fast and slow exchange can be observed. SupraFit, as most of the other applications, can only handle fast exchange, where the observed signal is the weighted average of all signals of the specific proton in the components, e.g. the shift of a proton assigned to the isolated receptor and one to the complex.
Since the relative change of the chemical shifts is of interest, it is defined as the ratio of each component to the reference component: in following case using the first component. Equation 5 reads for NMR titration as follows: On the other hand, for the slow exchange, for each component a signal for the specific proton can be observed, where the intensity is related to the amount of the species. 25

UV/VIS Titration
In the UV/VIS titration, the overall absorbance is the sum of the individual extinction coefficient ε i multiplied by concentration of each component. The equation holds true for low concentrations that fulfill Lambert-Beers Law.

General aspects
The basic part of isothermal titration calorimetry is the observation of the change of heat due to a complex formation in a reaction cell while keeping the temperature constant. The guest component B is sequentially added to a solution of the host component A. Details on that method can be found in literature of Freire,26,27 in Analytical Methods of Supramolecular Chemistry 28 by Schmidtchen, as well as in reviews by Thordarson. 15,29 The basic ITC equation 8 describes a sum over all formed complex species multiplied with corresponding heat of formation. In contrast to NMR and UV/VIS titrations, the pure host signal does not contribute to the observed heat. At the current state, SupraFit only makes use of models, that are of fixed stoichiometry and equal to the well known NMR titration models, that are summarised in section 3.3. Furthermore, SupraFit handles titration experiments with both, a fixed-volume set up as well as a set up with variable volume.

Handling dilution effects
Since upon each injection of B the concentration of B itself changes, an amount of signal can be lead back to a heat of dilution (Q d ), that cannot be neglected. Assuming a linear relationship between the concentration of B and the response heat signal, one can use equation 9 to add blank effects to the experiment (eq. 10), as done for example in pytc. 16 As a consequence, different approaches to deal with the dilution can be realised using SupraFit: 1. Using equation 10, two parameters are introduced (m δ and n δ ) and fitted alongside with the stability constants and the heat of formation to the experimental titration curve. An additional blank experiment does not have to be performed.
2. The two blank parameters (m δ and n δ ) are obtained from an independent dilution experiment and are added as fixed terms to equation 10.
3. The result of the independent dilution experiment is subtracted from the titration experiment and which is used to fit the parameters in equation 8 afterwards.
4. The blank parameters are fitted to a blank experiment and the titration simulaneously, while the stability constants and the heat of formation are fitted to the titration experiment only (eq. 10).

Thermogram handling
SupraFit provides ready-to-use thermogram integration functions with elementary baseline corrections for *.itc and plain thermogram files consisting of columns with time and heat per time, respectively. The baseline is separately calculated for each peak as a linear function, where the integration range can be adjusted manually. In case of very unregular baselines, different software packages may be more sufficient, such as NITPIC or software provided by the hardware supplier. After integration using third party software, the plain data can be processed with SupraFit.

1:1 Model
The simplest form of complexes with two components are the 1:1 complexes (a = 1, b = 1), which are formed according to equation 11. K 11 denotes the step-wise complex formation constant. The approach is sketched in Appendix B, resulting in equation 12.

2:1/1:1 Model
A model of 2:1/1:1 stoichiometry is defined through the following relationship: The stepwise stability constants K 11 and K 21 combine to the cumulative association constants as follows: The solution for the concentration of A is given in equation 47 in the appendix. 15 The corresponding equations to describe a 2:1/1:1 model used within SupraFit are summarised in Table 2, with the guest molecule being silent. In case of ITC experiments, 2:1/1:1 are not used regularly, but have already been reported. 30

1:1/1:2 Model
The 1:1/1:2 system is defined through following law of mass action: The concentration of unbound guest can be calculated analogously to the 2:1/1:1 systems using equation 50, 15 where the free host concentration can be determined using the mass-balance equations for 1:1/1:2 system.
Having the free and complexed host concentrations, the signals are calculated in SupraFit using the equations in Table 3, with the guest molecule being silent.

2:1/1:1/1:2 Model
The last titration model implemented in SupraFit is the mixed model with 2:1, 1:1 and 1:2 species. AB The solution of this system is defined by the mass-balance equation The mass balance equation can be simplified and reads as: The solution to this equilibrium system is obtained using an iterative procedure: The initial concentrations are guessed as Alternatively to this algorithm, methods to solve any equilibria system based on a Gauss-Newton optimisation have been published. 31 A Levenberg-Marquardt optimisation has been tested in SupraFit, but was disabled. § § During Monte Carlo simulations the Levenberg-Marquardt optimisation was not as efficient as the approach described above. However, a detailed benchmark was not prepared.

Cooperativity
Cooperative effects describe increasing or decreasing step-wise bindings constants in multi-step systems and have been discussed in the literature. 29,32,33 Following the notation of Thordarson, 11,15,29 four different types can be distinguished: full, noncooperative, additive and statistical. These models can be applied to 2:1 and 1:2 complex species in the mixed models in SupraFit.
The different kinds of relationship that can be set up in the model options are summarised in Table 5. Table 5 Different cooperative binding models define the relationship of the estimated model parameters. The relationships are taken from Hibbert and Thordarson, 2016. 11 K 2 refers to either K 12 or K 21 , depending on the stoichiometry of the complex. Similar, δ ∆2 refers to the signal of either the 2:1 or 1:2 species, whereas δ ∆1 denotes the 1:1 species

Michaelis-Menten Theory
Michaelis-Menten theory is usually used to describe how the rate r of an enzymatic reaction, that transforms a substrate S to a product P (eq. 25), depends on the amount of substrate S 0 . 34 The rate is defined as At high concentrations of S, the rate r tends towards v max . A linearised form of the Michaelis-Menten equations, the Lineweaver-Burke form (eq. 27), is usually used to determine K M and v max .
SupraFit provides a model to determine K M and v max using nonlinear regression. The starting guess is calculated using eq. 27.

Nonlinear least-squares regression
The set of unknown parameters θ , that are used to describe the relation of the independent data x and the experimental data y exp (eq. 28), have to be adjusted to minimise the sum of squared errors (SSE, eq. 29). In case of NMR titrations θ corresponds to the stability constants and chemical shifts each component, x to the concentrations and y to the observed chemical shifts. In connection with ITC experiments θ refers again to the the stability constants as well as the heat of formation and optional to the dilution parameters. The integrated peaks of the a thermogram form y and the concentrations remain to be the independent parameters x.
For the nonlinear problem, the Levenberg-Marquardt Algorithm 35,36 as implemented in Eigen, is used.
y exp,i denotes the experimentally observed value at i, y calc,i the estimation of the observed value according to the model parameter and e i the residual at each data point. The parameters θ are henceforth referred as toθ in case they are the best-fit parameters after least-squares optimisation. Characterisation of the fit can be realised using the standard deviation of the residuals σ f it (eq. 30), SE y (eq. 31) and χ 2 (eq. 32): 15 SE y is the corrected standard deviation with respect to the number of parameters (k) in the applied model.

General
SupraFit reads simple Table files as well as *.itc files. For the later, the thermogram import is straight forward. Additionally, data simulation and basic experimental planning are available 1-20 | 5 with the current functions. More details on the usage of SupraFit are available in the quickstart, that can be downloaded on the GitHub webpage at https://github.com/conradhuebler/SupraFit.

Technical aspects and implementation
SupraFit is written in C++ relying on the C++14 standard and should be compilable on every platform, that is supported by Qt and Eigen. The model implementation makes use of objectoriented programming to easily implement new models. It is out of the scope of this article to deal with the detailed implementation, but a short summary will be given: The source code is separated into four parts: (1) the core components containing the models, source code for optimisation and collected mathematical tools. Statistical analysis is implemented in the second part (2). Both parts, (1) and (2), are independent of any user interface and provide the functionalities for the pure command line application suprafit_cli.exe (3) and the graphical user interface suprafit.exe (4).
The core part holds the functionality to store the experimental data (DataClass), that is realised using a shared data pointer. Model preparation is done in the abstract class AbstractModel, that is based on that DataClass. Therefore, each implemented model inherits from AbstractModel and DataClass, respectively (Fig. 2). In the specific model implementation, the equations of the model and the number of input parameters have to be defined, as well as the names of each parameter. More details can be found in the source code documentation for the AbstractModel, AbstractTitra-tionModel and Michaelis-Menten-Model. 37 Parallelisation is mostly done using the threads concept utilising QThreadPool and QRunnable, but individual parts use openMP. Data storage is done using the JSON Format (*.json) or Zip compressed JSON (*.suprafit).

Confidence Intervals
Parameter (θ ) estimation is the main question in regression, as it allows the rational analysis and comparison of data sets and experiments. Yet the knowledge of θ is often not sufficient for rational analysis, 38 as the best fit values may differ for several performed experiments. The confidence interval of a parameter θ i estimates the range [θ i,− , θ i,+ ], within which the true parameterθ i can be expected. However, the standard approach used in (multiple) linear regression cannot be applied for non-linear problems. SupraFit provides two basic routes to approximate the confidence interval, both being described in the literature before. Explicit references will be given in each section. One of the goals of this article and SupraFit regarding the statistical tools is not to have one correct way to calculate confidence intervals, but rather present the already known techniques, provide an easy way to access those and show some examples on how these methods can be applied to parameter estimation problems.

Confidence Intervals by Monte Carlo simulations and percentile method
A powerful tool, that is used in many fields of science is the Monte Carlo simulation. 39 It has already been applied to both ITC and NMR titration apart from confidence calculation. [40][41][42] The application to calculate confidence intervals has been reported for titration experiments by Thordarson 15 and in general by Motulsky and Christopoulos. 43 The confidence intervals from Monte Carlo simulations are obtained using the percentile method, which has been discussed alongside with resampling methods by Efron. 38 Efron noted, that the section dealing with confidence intervals "is highly speculative in content." 38 The basic idea of the Monte Carlo approach is to theoretically repeat the performed experiment several times (T ). A single theoretic step is being realised by adding a random error ε i to y calc,i and then obtain a new set of data mimicking the original experimental data including realistic errors. These data can be used to estimate a new set θ . Performing these steps T times is denoted as Monte Carlo (MC) simulation within this context.
Two main approaches to define the errors ε are implemented in SupraFit, (a) they are calculated from the standard normal distribution ε ∈ N(µ = 0, σ 2 ) or (b) randomly chosen from the absolute errors obtained after the successful fit (ε ∈ e). The later will be called bootstrapping (BS) in SupraFit and may be interpreted as a mixture of a typical Monte Carlo simulation and resampling technique. Bootstrapping is one of the resampling plans discussed by Efron. 38,44 More recent discussions and problems using the bootstrapping method can be found in Canty et al. 45

and in Efron and
Hastie. 46 The applied standard deviation σ MC in approach (a) can be taken from the SE y , σ f it or as manually defined value, where SE y is the default choice as proposed by Motulsky and Christopoulos since it is the corrected standard deviation (eq. 31). The 1 − 2α confidence interval for each model parameter is then calculated using the percentile method: which results in the 95% confidence interval if α = 0.025. In SupraFit, this is realised by collecting all model parameters for each Monte Carlo step and then take α · T and (1 − α) · T entry of the ordered list of the corresponding parameter. More advanced percentile methods, which are available in octave or R, are not implemented, so for a smaller number of T the results differ from those obtained with the standard approach using the quantile function in octave or R. ¶ More robust methods will be implemented in future releases. Efron proposed 2000 steps as minimum for bootstrapping methods, 46 which is taken as standard for all Monte Carlo simulations in conjunction with the percentile method. Since Monte Carlo simulations are parallelised, || it benefits from the multicore architecture of modern desktop computers. Monte Carlo results are then reported as histogram-like charts as printed in Fig. 3. The box represents the 95% confidence interval, the dash-dotted line the estimated parameter. The individual bins are not plotted as typical bars but rather as a line-plot. Alternatively to the variation of y calc , Thordarson proposed the variation of input data, which are the initial concentrations of host and guest molecules in case of NMR titration. 15  lations. To the best of the authors knowledge confidence interval calculations have not been reported for this derivation, however percentiles can be calculated in the same way.

Confidence Intervals using the F-Test approach
The F-Test approach to confidence intervals has first been proposed by Box 47 and Beale, 48 and further outlined by Beechem 49 as well as Bates and Watts. 50 Taking the least-squares estimated set of parametersθ , the confidence interval then includes all values θ that are equal to the best-fit estimationθ . This can be formulated as following hypotheses H 0 : θ =θ and the alternative H A : θ =θ . The decision is based on the F-Test (eq. 34), where the ratio of SSE(θ ) and SSE(θ ) has to be smaller than the value, that defines the (1 − α)·100% confidence interval. 51 In equation 34, K refers to the number of parameters, N to the number of data points and F α N,N−K to the critical value in the Fdistribution for the given degrees of freedom and desired confidence interval. A graphical interpretation is given in Fig. 4. The sum of squares has a minimum at SSE(θ ) and can be decreased to θ i,− or increased to θ i,+ while the error is smaller than SSE max .  43 Keller at al. 52 published an Excel-Guide to apply the F-Test to Michaelis-Menten Kinetics using the Weakened Grid Search. SupraFit provides both approaches to the F-Test, that will now be introduced:

Weakened Grid Search
Having K parameters to be analysed, the first θ i is changed by small δ θ ** and then fixed, while the remaining θ j =i are optimised. The parameter θ i is changed again by δ θ and the θ i = j parameters are estimated anew. This is to be repeated as long as SSE(θ ) is smaller than SSE max and therefore H 0 is not rejected. This procedure is performed for all parameters in the same manner and all θ that satisfy equation 35 define the confidence region. 15 In SupraFit some additional parameters are introduced to control the procedure, like the maximum number of steps, the step size and the convergence threshold for the sum of squares. A comprehensive list is given in the manual of SupraFit and a short description of each parameter is shown as tooltip in the SupraFit program. Obtained results are graphically presented as shown in

Model Comparison
An alternative way to the F-Test approach is denoted as Model Comparison. During MOC calculations θ i is varied by an amount of δ θ while the remaining θ i = j are not optimised, but systematically varied to fullfill equation 35. The parameter θ i is then again changed by δ θ and the remaining θ i = j are varied to meet the condition in equation 35. This is repeated until the change of θ i disobeys equation 35. After performing this approach for all K parameters, the limits of the confidence region can be extracted from all obtained values of θ i as θ i,− = min(θ i ) and θ i,+ = max(θ i ). 43 Assuming that there is only one parameter to be optimised, applying WGS and MOC as described in Motulsky and Christopoulos, 43 both methods perform similarly: θ k will be varied by ±δ θ until SSE(θ k ) reaches the maximum possible SSE and the tuple (θ k,+ , θ k,− ) correspondence to the confidence interval. This approach of continuously varying one parameter is implemented in SupraFit as Simplified Model Comparison ** δ θ is both, positive and negative so that θ i is tested for values smaller and greater thanθ i .
(SMOC). Like WGS, the Simplified Model Comparison benefits from multiple processes, since each parameter is evaluated in a single thread. Instead of systematic variation, SupraFit provides the Model Comparison as Monte Carlo experiment just like the calculation of an arbitrary area: Uniform random numbers are generated within defined boundaries for every θ i , where these random parameters are stored if SSE(θ ) meets equation 35. The confidence interval is then defined by the minimum and maximum values for all θ . The implementation works as follows: Simplified Model Comparison is applied to each parameter and the confidence interval ] is obtained. The intervals are scaled by variable parameters, which define a rectangular box in case of two variables, a cuboid for three parameters etc. (dash-dotted box in Fig.  6). Uniform random numbers are generated within the interval defined by the box and checked if they obey equation 35. If they do, the parameters are kept, otherwise they are discarded. An ideal confidence interval is represented in Fig. 6 as red ellipsoid, with the maximum values for θ 1 and θ 2 form the limits of the confidence interval. Similar to previous methods, Model Compar- ison is parallelised, where amount of Monte Carlo steps is equally divided across the threads.

Resampling Methods
Cross Validation (CV) is a powerful tool, applied for example in QSAR in conjunction with principal component selection. 53 In SupraFit, CV will be applied to determine the sufficiency of the used model. Another method, not yet described and applied to supramolecular titration experiments is called "Reduction Analysis." Both methods will be introduced in a subsequent article, that focuses on a statistical approach to analyse binding stoichiometry.

Linear Regression Tool
SupraFit provides a linear regression tool for experimental data, that can be used to fit several linear functions to experimental data. The data points are continuously divided: In case of three functions, the first function is fitted using the first n 1 ...n i data points, the next functions uses the next n i+1 ...n j data points and the last functions uses the remaining n j+1 ...n N data points. The maximal number of functions is N/2, where each function is described by two points. The currently implemented method tests all available combinations and returns an ordered list. One field of use will be shown for NMR titration, to create Mole Ratio plots. Another application will be shown within the ITC examples.

Global fitting
Programs like pytc or SEDFIT provide methods to perform a global fit, 16,21 that is to fit a single set of parameters to more than one experiment. In that fashion, analysing several signals in NMR titrations is already a global fit, 15 since one formation constant is connected to two or more signals. While a global fit for NMR titration is straightforward, combining several ITC experiments is performed with MetaModels in SupraFit. MetaModels are empty container models, that can hold and manage real models. Model parameters can be handled individually or any in combination thereof. However, the first approach is identical to a local fit. Statistical analysis or global search can be performed on MetaModels in the same way as on simple models. An example of MetaModels will be discussed in the ITC section.

Uncorrelated parameters
An example using a function with two uncorrelated parameters θ 1 and θ 2 is used to illustrate the preceding aspects of the statistical analysis. The function in equation 36 acts on the element m of the vector having M elements: Thus, θ 1 acts on the first half of the interval while θ 2 acts on the second half. Depending on the values for θ , the function is discontinuous at m = M/2. In the range of {0.05, 0.05+0.05, ..., 1.95− 0.05, 1.95} with θ 1 = 1.8801 and θ 2 = 7.4043, after adding a random error (ε ∈ N(0, 0.25)), the function (eq. 36) is drawn in Fig.  7. The 95% confidence intervals using F-Test based methods applying the (Simplified) Model Comparison and Weakened Grid Search approach are given in Table 6. Both have been applied to either parameters individually (MOC a , WGS a ) or to both (MOC b , WGS b) ) together. The F-Test confidence intervals are effectively the same, independent of the approach, with some numerical differences due to step size during the evaluation. Using Monte Carlo simulation (ε = SEy) with the percentile method, the confidence interval is much narrower than these obtained with the F-Test approach. Those differences were already pointed out by Motulsky and Christopoulos. 43 The variation of the individual parameters θ i by ±δ θ ,i and the 1-20 | 9  corresponding SSE for SMOC and WGS are shown in Fig. 7(c) and 7(d). In both charts, the series show a parabolic trend, indicating that the SSE max can be reached during variation. The correlation coefficient for θ 1 and θ 2 and the scatter plots ( Fig. 8) after MOC, WGS and MC clearly indicate that there is no dependency between both parameters, which is in agreement with the given function. The obtained correlation coefficient for θ 1 and θ 2 is 3.6 · 10 −5 after Model Comparison. Using WGS the accepted values for θ 1 and θ 2 show a correlation coefficient of zero. The lines display two sets of accepted values for θ 1 and θ 2 , where one parameter is not affected by changing the other. The model parameters after Monte Carlo simulation indicate no correlation (R 2 = 1.9 · 10 −4 ) as well, but the pairs of θ 1 and θ 2 do not form a complete ellipse as obtained after Model Comparison. However, in case of functions or models with uncorrelated parameters the implemented F-test based approaches lead to practically identical results, which differ from the Monte Carlo simulation based results.

Correlated parameters
A function where θ 1 and θ 2 are not independent is given in equation 37. The same input data are used as in previous example, where θ 1 = 4.8321 and θ 2 = 8.5912. Random error (ε ∈ N(0, 0.25)) is added to simulate experimental noise.
The corresponding diagrams are plotted in Fig. 9, including the graphical interpretation of the SMOC and WGS approaches. The confidence intervals, that are calculated similarly to the previous example, are summarised in Table 7: SMOC and MOC a result in the same confidence interval, and MOC b and both WGS a and WGS b result in the same confidence intervals, however different from the first one. This is expected, since SMOC and MOC a take only one parameter into account and fix the remaining, while MOC b and WGS take both parameters into account. The confidence intervals after Monte Carlo simulation are narrower than the WGS/MOC b confidence intervals, but broader than the SMOC and MOC a intervals. The graphical interpretation of SMOC and WGS are shown in Fig. 9(c) and 9(d). While all series again show a parabolic trend, the series for θ 1 or θ 2 differ slightly for both methods. The correlation between θ 1 and θ 2 can be analysed using the correlation coefficient and the scatter plots as shown in Fig. 10. Apart from the different confidence intervals, the ellipsoid after MOC is rotated with respect to the ellipsoid in Fig. 8a and correlation can be observed (R 2 = 0.70). The scatter plot after WGS shows two lines again, where each line is assigned to the variation of one parameter. The correlation coefficient indicates a strong correlation (R 2 = 0.98), which however is an artefact since only the best-fit values are included but not all possible values that obey equation 35. Monte Carlo simulation on the other hand leads to a similar scattering of the parameters and a very similar correlation coefficient (R 2 = 0.70).
Having two parameters (θ k and θ l ) and performing WGS for only one parameter θ k , the F-Test confidence interval for the corresponding parameter is obtained since θ j is always adjusted. However, performing the MOC and limiting it to one parameter θ k , the confidence interval will always be smaller or equal to the correct F-Test confidence interval, since at SSE(θ MOC k ) = SSE max there is still the other parameter θ l to be adjusted. If there is no   correlation between θ k and θ l , both parameters can be varied independently of each other and the F-Test confidence interval of the Simplified Model Comparison and WGS are equal.

Linear Regression
In case of linear models, the (1-α) confidence intervals can be calculated with standard software like Excel or similar spreadsheet programs as well as statistical software like R. In Table 8, we report the confidence intervals for a linear model using the t-distribution approach calculated using Gnumeric 54 as well as the approaches for non-linear models implemented in SupraFit.
The data used were obtained adding ε ∈ N(µ = 0, 0.01) to a linear model y = θ 1 x + θ 2 with θ 1 = −820.000 and θ 2 = −0.333 (Fig.  11). The least-squares estimated parameters are θ 1 = −787.551 and θ 2 = −0.334. The non-linear F-Test based confidence interval differs much from the smaller linear t-distribution bases interval. Monte Carlo simulation with T = 50000 steps was performed as bootstrapping and using SE y and σ f it as input standard deviation. The BS confidence interval is the smallest and the interval using SE y is the widest, since SE y > σ f it . However, the obtained confidence intervals after Monte Carlo simulations are very close to the one calculated with the linear approach, being only slightly smaller. Using SE y as ε for Monte Carlo simulation recovers the linear approach best.

NMR Titration
To demonstrate the application of SupraFit in case of NMR titration, example calculation on an artificial NMR titration with a 1:1/1:2 binding stoichiometry were performed. The stability constants to set up the experimental data were chosen to be lgK 11 = 3.81 and lgK 12 = 2.14 The chemical shifts can be found in the supporting information. The individual shifts are not meant to represent a realistic example. A random error obtained from a normal distribution with ε ∈ N(µ = 0, 0.001) was added afterwards, where every single signal has the same σ , therefore e.g. signal 6 (∆δ = 2.3038 ppm) and signal 7 (∆δ = 0.2441 ppm) have both the same random error. The "experimental" titration curve can be found in Fig. 12(a). The four possible models (1:1, 2:1/1:1, 1:1/1:2 and 2:1/1:1/1:2) were tested without cooperative relationships.

Mole Ratio Plot
Using SupraFits linear regression method with two functions, Mole Ratio plots can easily be generated. Since the typical plots exhibit the chemical shift on the x axis and the ratio on the y axis, the plots obtained using SupraFit differ from the "standard" plots. However, as the chemical shifts depends on the ratio and not vice versa, SupraFit provides only the "non-standard" way. The plot can be found in Fig. 12(b). For each series, all possible intersections of adjacent linear functions are calculated. The result for the best fit, that fit minimising the sum over all SSE, is listed in the supporting information. The intersections of the two functions per signal ranges between 1.13 and 1.27, indicating a system that exhibits 1:2 species. This is in accordance with the stoichiometry of the original model.

Fitted parameter
The resulting stability constants (lg K) after optimisation are printed in Table 9, statistical judgements using SSE and SE y can be found in Table 10. The titration curve as well as the remaining absolute errors can be found in Fig. 13. The complex formation constants for the correct model differ only slightly from the initial ones. The easier 1:1 model estimates a lgK 11 that is too small, as happens upon fitting the 2:1/1:1 model. The most complex model resamples lgK 11 and lgK 12 , but the incorrect model parameter lgK 21 is realistic. Some of the chemical shifts in the 2:1/1:1 are smaller than zero (δ A 2 B,1 = −6.6334 ppm), indicating a change in the chemical shift up to 13 ppm (δ AB,1 = 6.5565 ppm). A full list of all parameters can be found in the example file in the SupraFit repository at GitHub. The "visual inspection" as described by Hynes,9 can be performed using the charts in Fig. 13(c) and 13(d), where all absolute errors are plotted in Fig. 13(c)   13(d). Clearly the 1:1 model perform worst, followed by the 2:1/1:1 model, with both having heteroscedastic errors. The remaining two models are optically indistinguishable with both errors being homoscedastic. † † Considering the resulting SSE, the decision towards the correct model can already be made, since SSE 1:1/1:2 ≈ SSE 2:1/1:1/1:2 and 3 · SSE 1:1/1:2 < SSE 1:1 . 15 Comparing SSE of the fitted 1:1/1:2 model and the correct model show the slightly smaller error for the optimised model.

Monte Carlo Confidence Intervals
Following the strategy of the Monte Carlo simulation, the introduced error can be calculated from the standard normal distribution with (a) a defined variance or (b) via bootstrapping. To test the influence of different approaches on the confidence interval, a set of simulations were performed on the given dataset with the optimised 1:1/1:2 model. The standard normal distributed errors were generated with σ MC = SE y , σ MC = σ Fit , σ MC = 1e −3 , σ MC = 2e −3 , σ MC = 3e −3 and σ MC = 5e −3 . Monte Carlo simulation with T = 100, 200, 300, 500, 700, 1000, 1500, 2000, 2500, 3000 and 5000 steps were performed, where each simulation was repeated 300 times. The 95% confidence interval was then characterised by the median and standard deviation of the 0.95 interpercentile ranges (IPR) for these 300 Monte Carlo simulations. The boxplots and the standard deviation of the 0.95 IPR values for the stability constants lgK 11 and lgK 12 after the Monte Carlo simulation are reported in Fig. 14 and show expected behaviour: With increasing steps T, the observed standard deviation of the IPR decreases. The same trend is visible for the other Monte Carlo simulations including BS (see Fig. S7 -S13). With increasing step count the IPR converges to the ideal IPR that could be obtained after an infinite number of steps. As Efron stated, 46 at least 2000 steps are required for the bootstrap method to obtain reliable results. However, since every Monte Carlo step requires the least-squares estimation of θ , this approach is demanding. As shown in Fig. 15, Monte Carlo simulation scales well with the number of threads used and benefits from Hyperthreading technology. ‡ ‡ Therefore, accurate Monte Carlo simulation with 2000 † † This is expected as they resample the original normal distributed random numbers. ‡ ‡ The benchmark was obtained on a i9-7920X CPU with 12 cores overlocked to 4.00GHz, using openSUSE 15.0 Leap. SupraFit was compiled using gcc 7.4.1. steps can easily be obtained within minutes even on a desktop computer with fewer cores. As shown in Fig. 16 the confidence intervals obtained from 300 Monte Carlo simulations with each simulation performed with 5000 steps using BS or random errors and different σ MC differ. As σ MC increases, the confidence interval gets broader and standard deviation of the IPR increases. However, the differences between bootstrapping and random error with σ MC = σ f it are very small but since the Kruskal-Wallis-test results in a p-value = 0.002 < 0.05 for lgK 11 and p = 0.023 < 0.05 for lgK 12 , the differences are significant for the given example. The corresponding plots for lgK 12 are presented in Fig. S14.

Correlation of lgK 11 and lgK 12
Since the current NMR titration model has more than two parameters, the correlation of lgK 11 and lgK 12 will be analysed either neglecting or taking the parameters, e.g. the chemical shifts, into account. Therefore, a Monte Carlo simulation with σ MC = SE y and T = 10000, two runs of Weakened Grid Search, the first only for lgK 11 and lgK 12 and the second for all parameters and Model Comparison for lgK 11 and lgK 12 were performed. The scatter plots for lgK 11 vs lgK 12 are shown in Fig. 17 and the confidence intervals are given in Table 11. The first two charts show the scattering of the complex formation constants after applying the Weakened Grid Search, where Fig. 17(a) contains only two series, since only two parameters were tested. However, the chart in Fig. 17(b) shows more than two series, as all parameters were taken into account. Incorporating more parameters, the correlation coefficient drops from 0.80 to 0.74 since more points from the original series are available. However, the high correlation is an artefact as already pointed out in the example of the function with correlated parameters in the previous section. The scatter plot after Monte Carlo simulation in Fig. 17(c) shows an ellipsoid, with the parameters having a correlation coefficient of 0.37. On the other hand, using Model Comparison with only taking two parameters into account, one obtains a complete ellipsoid, which however is rotated with respected to the Monte Carlo ellipsoid and to the series obtained after Weakened Grid Search (Fig. 17(d)). Therefore, naive Model Comparison leads to wrong results regarding confidence intervals and the ellipsoid, if correlated parameters are ignored.

Isothermal titration calorimetry
The ITC data used in the following section are taken from the pytc-demo.
The complex formation of Calcium with EDTA (see https://github.com/harmslab/pytc-demos, last checked 17.01.2022) where reported by Harms et al. 16 to demonstrate the pytc tool. The heat is given in cal and cal/mol. In the first part, the initial guess of the parameters in case of a 1:1 model are described, since a good starting point for the non-linear regression is essential.
The f x value, the inflection point of the titration curve, 55 is guessed by fitting three non-overlapping linear functions to the isotherm. The guessed f x value is then obtained as mean of the intersection of first with the second function and the second with the third function (Fig. 18). The heat of formation is calculated using the heat of the third injection Q 2,3 divided by the change in concentration of the added guest [B] component. It is assumed, that at the start of the titration the concentration of the formed complex is nearly the same as the added guest concentration since [B] [A]. The stability constant is then calculated using the bisection method within the limits of 1 ≤ lgK 11 ≤ 10. The initial guessed parameters of the 1:1 model are applied to the models of mixed stoichiometries as well. See Table S2 for the comparison of the initial guessed and fitted parameters for the hepes data.

Global Fit
MetaModels were used to globally fit lgK 11 and δ H AB to the data of hepes-01, hepes-02 and hepes-03 from the pytc-demo that are followed by Monte Carlo simulation to estimate the confidence intervals. These results were then compared to the confidence intervals obtained from Monte Carlo simulations for the individual experiments. The obtained parameters and the confidence intervals using Monte Carlo simulation (σ MC = SE y , 5000 steps) are listed in Table 12. While the globally estimated lgK 11 is nearly the mean of the individual models (7.595), the IPR for lgK 11 can not be approximated by the mean of the individual IPR (0.065). The

Dilution
The same example data set from pytc was used to analyse the effect of the dilution experiments on the parameter estimation. The four approaches, described in section 3.2.3, were applied: As first approach (1) the titration was analysed with dilution correction, included according to equation 10 but without referring to any external blank titration. Including dilution using another experiment was realised as follows: (2) An external blank titration was used to estimate the two dilution parameters m δ and n δ in equation 10, which were included and kept constants while lgK 11 , ∆H and f x were obtained. The third parameter estimation (3) was performed using equation 8 after the blank experiment was subtracted from the complexation experiment. In the last experiment (4) the dilution and the complexation experiment were combined as MetaModel. Therefore m δ and n δ were estimated using the dilution and the titration experiment globally, while lgK 11 , ∆H and f x were estimated locally, using only the data from the titration experiment. The corresponding isotherm and blank experiment are shown in Fig. 19, the estimated parameter for hepes-01 are listed in Table 13. The heat observed from the dilution experiment is very small, compared to the heat from binding experiment. Fig. S15 contains the three isotherms and dilution experiments for the hepes-01, imid-01 and tris-01 data sets. See Tables  S3 -S5 for all best fit values as well as the confidence intervals of the parameters lgK 11 and ∆H AB . Monte Carlo simulation with 20000 steps and σ MC = SE y were performed, the corresponding boxplots for lgK 11 and ∆H AB in case 1-20 | 15  11 and lgK 12 included, R 2 = 0.24 Fig. 17 The resulting scatter plots for lgK 11 and lgK 12 differ for various statistical approaches.

Fig. 18
The initial value for fx is guessed using three linear functions.   Fig. 20. Boxplots including all parameters and the data sets imid-01 and tris-01 can be found in the supplementary information in Fig. S16 -S20.
In the hepes-01 data set, the differences between neglecting the dilution and strategy (1) are very small in case of the estimated values for lgK 11 and ∆H AB . However, Monte Carlo simulations reveal, that there is an influence on the confidence intervals. For both, imid and tris data, the differences between the estimated parameters (lgK 11 and ∆H) and the corresponding confidence intervals comparing the neglected dilution and strategy (1) are much more intense (see Fig. S16 and S17). The results after explicitly including the dilution experiment in the parameter estimation following the three remaining approaches show that all three methods result in different best-fit parameters as compared to none dilution and strategy (1). However, the Monte Carlo simulations indicate, that the subtraction of the results of the hepes blank experiment deteriorates the statistical parameters of the obtained values for lgK 11 and ∆H compared to strategy (2) and (4). In the imid and tris data sets, similar broadened confidence intervals as indicated by IPR and σ were not observed (see Table  S16 and Fig. S17). This can be explained by using the correlation coefficient for the linear fit of the dilution experiment ( Fig. 19(b) and Fig. S15), where R 2 is worst for hepes (R 2 = 2.88 · 10 −4 ) and better for imid (R 2 = 1.58·10 −3 ) and tris (R 2 = 6.52·10 −3 ) dilution data.
It was demonstrated, how the influence of various approaches to include blank experiments can be analysed using Monte Carlo simulation. In the present example, the obtained parameters only change on a very small scale, e.g. the heat of complexation varies in scales of less than 0.5 kcal/mol due to the small heat of dilution. This may however not be true in general and statistical post-processing can help to understand the obtained results more deeply.

Conclusion
A new graphical program to perform non-linear regression with focus on the calculation of stability constants by means of NMR titration and ITC experiments has been presented. The software is written in C++, using the Qt Toolkit and the Eigen library and is fully open source and therefore transparent regarding the underlying mathematics and algorithms. Additionally to the pure estimation of the various physical parameters, that are used to describe the complexation process, statistical analysis can be performed to obtain confidence intervals for each single parameter and to gain a deeper insight in the performed experiments. The adoption of several techniques are reported, which are already described in the literature (Monte Carlo simulation and F-Test approaches), however the routinely usage of these approaches has not been reported yet. We hope, that SupraFit provides a good basis to analyse titration experiments with respect to the statistical judgement and to further improve the insight in the supramolecular systems. We additionally aim to provide SupraFit as easyas-necessary and as powerfull-as-possible regarding the usability of the user interface, that all the tools brought by SupraFit are straightforwardly accessible. Contributions like new models or statistical post-processing are welcome. The source code and binaries of SupraFit can be obtained free of charge from the GitHub repository at https://github.com/conradhuebler/SupraFit.

Conflicts
There are no conflicts to declare.

Systems of 1:1/1:2 stoichiometry
The mass balance equations are formed similarly to the other systems, with β 12 = K 11 K 12