Inclusion of Control Data in Fits to Concentration–Response Curves Improves Estimates of Half-Maximal Concentrations

Concentration–response curves, in which the effect of varying the concentration on the response of an assay is measured, are widely used to evaluate biological effects of chemical compounds. While National Center for Advancing Translational Sciences guidelines specify that readouts should be normalized by the controls, recommended statistical analyses do not explicitly fit to the control data. Here, we introduce a nonlinear regression procedure based on maximum likelihood estimation that determines parameters for the classical Hill equation by fitting the model to both the curve and the control data. Simulations show that the proposed procedure provides more precise parameters compared with previously prescribed practices. Analysis of enzymatic inhibition data from the COVID Moonshot demonstrates that the proposed procedure yields a lower asymptotic standard error for estimated parameters. Benefits are most evident in the analysis of the incomplete curves. We also find that Lenth’s outlier detection method appears to determine parameters more precisely.


■ INTRODUCTION
Some data are just underappreciated.Maybe they look different or come from a different background from most other data.Maybe they do not fit neatly into common notions of what data on a "curve" should look like.Whatever the case, they are pigeonholed into a restricted role that limits their contributions.But if people would only give them the opportunity, they could improve the entire fit.This is the case for control data in concentration−response curves (CRCs).CRCs describe the response of a biological assay to different concentrations of a chemical compound.When the assay measures the response of an organism, the data are termed a dose−response curve.CRCs are an important step in evaluating the biological effects of a chemical compound.Widely used in pharmacology and toxicology, CRCs have become even more ubiquitous with the emergence of highthroughput screening.
CRC data are often summarized by the concentration required to induce a response halfway between the minimum and maximum.For inhibition assays, the IC 50 specifies the halfmaximal inhibitory concentration.For an agonist/stimulator assay, the EC 50 is the concentration of substance that elicits half of the maximal response.IC 50 /EC 50 values are often used to classify the potency of a compound against a target.For example, drugs may be categorized as high potency (below 1 μ M), medium potency (between 1 and 10 μ M), or low potency (above 10 μ M). 1 While it is possible to analyze CRC data and determine IC 50 / EC 50 values in many ways, standard approaches that are considered best practices in the scientific community are codified in the Assay Guidance Manual (AGM). 2 The AGM is a free online resource written by over 100 authors and managed by the National Center for Advancing Translational Sciences with input from experts working in industry, academia, and government.Its scope includes "appropriate statistical ways to analyze assay results and accommodate minor changes to assay protocols to ensure robustness." 3 According to the AGM, 2 CRC data should first be normalized by control data.As described in the section "Data Standardization for Results Management", the specific type of control data depends on the type of assay.In enzymatic inhibition assays, negative controls contain the enzyme including cofactors and substrate(s) but no putative inhibitor.Positive controls also include substrate(s) and cofactors, but the enzyme is either absent or fully inhibited.Regardless of the assay, normalization is performed by subtracting the response by the minimum, dividing by the range (difference between the minimum and maximum), and multiplying by 100 to obtain a percentage.For most assays, the minimum is the negative control and maximum the positive control.(The exceptions are for in vitro functional assays of agonist and inverse agonists, in which the AGM recommends fitting to the CRC of a reference agonist to obtain the minimum and maximum.) After normalization, the model most commonly fit to the CRC data is the classical Hill equation.As CRC data are usually monotonic with a sigmoid shape on a semilog plot, most curves are reasonably modeled by a logistic function in which the dependent variable x is the logarithm of the concentration c, x = log 10 (c/c θ ), where c θ = 1 M is the standard concentration.The classical Hill equation scales the logistic function to interpolate between the minimum response at the bottom of the curve R b and maximum response at the top of the curve R t as where x 50 is the base 10 logarithm of the concentration that induces the half-maximal response, H is the Hill slope, and θ = {R t ,R b ,x 50 ,H} denotes the full set of parameters for the equation.The x 50 corresponds to the base 10 logarithm of IC 50 or EC 50 , depending on the type of assay.
While the control data are used for normalization, statistical analyses recommended by the AGM do not explicitly include control data in curve fitting.The AGM goes into considerable detail about fitting the classical Hill equation to the CRC data.Fitting to competitive binding and functional assay data is discussed in the section "Assay Operations for SAR Support".Fitting to in vivo assay data is described in section 3.5 of the chapter on "In Vivo Assay Guidelines".In both chapters, the AGM suggests fitting the four-parameter logistic (4PL) model, in which all of θ is allowed to vary, to most CRC data.The control data are not included in the fit and therefore have no effect on the estimated x 50 or H.For some data, the AGM recommends the three-parameter logistic fixed bottom (3PLFB) model, where R b is fixed at zero, or the three-parameter logistic fixed top (3PLFT) model, where R t is fixed at 100.Conceptually, three-parameter models make the most sense when control data can be reasonably construed as measurements of the minimal response, R b or maximal response R t .For example, in an enzymatic inhibition assay, if the response is proportional to the concentration of a product, then the negative control without an inhibitor is a measurement of R t .Fixing R b or R t can strongly influence the estimated values of both x 50 and H, especially for incomplete curves. 4While three-parameter fits do not explicitly incorporate control data, these procedures actually give the control data a privileged position; these fits essentially set the value of a parameter based on the control as opposed to multiple concentrations along the curve.
Explicitly including control data in curve fitting, neither unduly ignoring nor privileging them, is a simple and intuitive idea.However, we found only two papers that evaluated whether it is a good idea.Weimer et al. 5 compared various fitting procedures to androgen receptor binding assay data and to simulated data with similar parameters.While negative controls were included in all fits, positive controls were either included or excluded.They found that including positive controls in fits to experimental data most significantly changes the estimate of R b .In fits to simulated curves, including positive control data improved the precision of the x 50 estimates (Table 2 of their paper).Thus, they recommended fits including both positive and negative controls.Kappenberg et al. 6 considered the problem of deviating controls -when the response of the negative control deviates from the response at low concentrations.In a literature review, they found deviating controls to be a common occurrence in the toxicological literature.They found that if deviations are small, the influence of fitting with or without controls depends on whether the curves are complete, with response values near both R t and R b .For complete curves, inclusion of controls had little influence on the fraction of estimates deemed acceptable.On the other hand, including control data was clearly beneficial for the analysis of incomplete curves.Curve fitting including control data is only rarely mentioned in software documentation and the affiliated literature.Among commercial packages, the documentation of Origin 7 does not mention control data whereas GraphPad documentation 8 mentions a suggestion from Weimer et al.: 5 Even if software is unable to accept concentrations of zero and infinity, control data may be incorporated by assigning them extremely high (for positive control data) or low (for negative control data) concentrations.The oft-cited book "Fitting models to biological data using linear and nonlinear regression: A practical guide to curve fitting", 9 coauthored by the founder and chief executive officer of GraphPad, spans 351 pages.As with the AGM, the use of control data is to fix curve parameters.It also describes how CRCs from control compounds may be used for global fitting and to evaluate whether parameters for a compound are different from those of the control compound.However, it does not discuss the inclusion of control conditions in curve fits.−15 These papers have focused on how the tools make curve fitting freely available and easier to use, 11,12,15 fit data with more complex multiphasic models, 10,14 and perform Bayesian uncertainty quantification. 13ther papers since 2010 have described how to make curve fitting more robust via an evolutionary algorithm, 16 a systematic search of parameter space, 17 or outlier detection. 18In only a few of these papers is fitting to control data mentioned at all.One of these mentions is in the paper about the popular drc package 12 within the statistical computing environment R, 19 which as of April 17, 2023 has received 1555 citations according to the Web of Science.It explains that model functions are implemented in a way such that they are defined at a concentration of zero.eeFit also formulates the Hill equation to avoid a singularity at zero concentration. 15However, these papers do not explain the benefits of including control data in curve fits.
Here, we describe and evaluate a method for incorporating control data into CRC fits.Based on the assumption that measurement error is Gaussian distributed, our method is based on maximizing the likelihood of observing all of the data given the full set of parameters θ.We also assume homoscedasticity, that is, that all the data on the curve have the same variance, but the approach may be readily generalized to heteroscedastic data.Given that they may contain a different number of species and require a different number of dilution operations, controls are assumed to have a variance different from the curve.Without control data, the maximum likelihood estimate of model parameters is the standard procedure of nonlinear least-squares regression; it minimizes the sum of square deviation between the model and the observed data.On the other hand, when control data are included, the estimated model parameters are distinct (Figure 1) and, as we show, usually more accurate.We demonstrate the method on simulations of complete and incomplete CRCs.We also apply the proposed procedure to 1830 CRCs for two types of SARS-CoV-2 main protease enzymatic inhibition assays that were collected for the COVID Moonshot, an open-source effort to develop a COVID-19 antiviral. 20Lastly, we describe the results of an outlier detection method.

■ RESULTS AND DISCUSSION
As detailed in the Experimental Section, curve fitting was performed with five methods: our proposed statistical approach and up to four established procedures.Our proposed approach may be summarized as a four-parameter fitting to a logistic function with controls (4PL+C).The established procedures are a four-parameter fit without controls (4PL); a three-parameter fit with fixed bottom such that R b = 0 (3PLFB); a threeparameter fit with a fixed top such that R t = 100 (3PLFT); and two parameter fit with a fixed bottom and fixed top such that R b = 0 and R t = 100 (2PL).Simulations were analyzed with all five procedures, while biochemical assay data were analyzed with the 4PL+C, 4PL, and 3PLFB procedures.In analyses with the four established procedures, data were normalized based on the sample mean of the controls.With the proposed approach, data were normalized based on °Rb and °Rt obtained from the fit.
The 4PL+C Procedure Improves the Accuracy of All Parameter Estimates Based on Simulated Data.The proposed 4PL+C procedure results in parameter estimates that have a lower error than all other tested statistical analysis procedures.For all tested procedures, the distribution of parameter estimates appears to be Gaussian, symmetric about the true value (Figures S1 and S2 of Supporting Information 1).For a set of N estimates of a parameter θ, n with n ∈ {1,2,3, ...,N}, the mean square error (MSE) is = ( ) 2 .This metric accounts for both the variance and the bias in an estimate.However, as mean values of all parameters are indistinguishable from the true values, none of the estimators are biased, and distinctions between are due to variance.The 4PL+C procedure leads to a reduction in MSE that is especially pronounced for R b (∼0.17 of 4PL) and R t (∼0.48 of 4PL) and smaller but still statistically significant for x 50 and H (∼0.75 of 4PL for both) (Table 1).The change in MSE is larger than the standard deviation across 5 independent sets of curves.While there are statistically significant benefits to using 4PL +C to analyze complete curves, the improvement is even more evident for incomplete curves.We obtained incomplete curves by removing data corresponding to the two highest concentrations of the simulated completed curves (Figure 2).
Removing these data increases the MSE of all parameter estimates (Table 2).While the increase in MSE is approximately 4-fold for the standard 4PL procedure, the increase is subtle for 4PL+C; using the positive control appears to mostly compensate for losing data points at the highest concentrations.Because 3PLFB also uses the controls to set R b , this procedure also compensates for the loss of these data points, and the precision is similar to 4PL+C.
While it is clear that including control data will improve the estimation of parameters, the extent of improvement will depend on a large number of factors.These factors include the variance of the control and curve data.They also include the set of concentrations at which responses are measured, including the number of points along the curve and whether the curve is complete or incomplete.Hence the aforementioned reduction in the MSE may not be representative of all situations.
Our result that including control data is especially beneficial for the analysis of incomplete curves is consistent with the previous observations.Kappenberg et al. 6 found that in the "difficult" situation of an incomplete curve, estimation with controls outperforms estimation without controls unless the controls are strongly deviating.Sebaugh 4 found that fixing R b or R t can strongly influence estimates of x 50 and H in incomplete curves.As mentioned in the introduction, fixing R b or R t privileges the control data by using it (and not the curve data) to set values for these parameters.The 4PL+C approach provides similar benefits for the analysis of incomplete curves without the subjective choice of which parameters to fix and without giving unwarranted privilege to the control data.
The Precision of Estimates from Repeated Experiments Is Consistent with Trends Observed in Simulations.The COVID Moonshot data include some repeated experiments.The fluorescence assay data include 9 independent CRCs for inhibition of MPro by a compound with database identifier CVD-0002707.The mass spectroscopy assay data include 25 independent CRCs for ebselen.The parameters estimated by applying 4PL versus 4PL+C are shown in Figure S3 of Supporting Information 1.In this figure, we compared 4PL+C to 4PL instead of 3PLFB because the former is more standard and is more widely used.For both data sets, 4PL+C reduces the standard deviation of estimated R b and R t compared to 4PL (Table S1 of Supporting Information 1).There was not a statistically significant difference between the standard deviation of pIC 50 and Hill slope.These results are inconclusive but consistent with the trends observed in the simulated data.
There is a large variation in the estimated Hill slope for ebselen.This is likely because ebselen is a covalent inhibitor.For a covalent inhibitor, the results are dependent on the time between incubation and measurement, which could be highly variable.This variability does not affect the comparison between fitting procedures, as the Hill slope reported for each curve is relatively insensitive to the fitting procedure (Figure S3 of Supporting Information 1).
4PL+C Is Reasonably Fast and a Simplification Is Widely Available.Although curve fitting with the 4PL+C procedure is slower than that with 4PL, both are still quite fast.Performing fits for all 1668 fluorescence curves took 6 min and 2 s for 4PL+C but only 18 s for 4PL.For all 238 mass spectrometry curves, the fitting took 36 s for 4PL+C versus 8 s for 4PL.As described in the methods section, the 4PL+C procedure alternates between optimizing the Hill equation parameters θ and the estimated variances.In contrast, the 4PL procedure requires optimizing only the Hill equation parameters.Nonetheless, 4PL+C is quite fast, and the additional computing cost required for the curve fitting should not be a barrier to using the procedure.
While existing data analysis software do not, to our knowledge, implement 4PL+C, all nonlinear regression software implement a simplified version of 4PL+C.Some packages like drc 12 use model functions that are defined at a concentration of zero.Even in software that do not, Weimer et al. 5 noted that it is straightforward to include control data by assigning them extreme concentration values such that the model response is a close approximation to the asymptotes.This procedure leads to a simplication of 4PL+C in which the controls are assumed to have the same variance as the curve.However, at least for the COVID Moonshot data, we see that this appears to be an unreasonable assumption.
Variances Differ between the Curve and Controls.In the fluorescence and mass spectroscopy CRC data collected for the COVID Moonshot, the estimated variance of the curve and the controls differ (Figure 3).In both assay types, the bottom control has the least variance.For the fluorescence measure-   Mean and standard deviation (in parentheses) of the MSE of parameters estimated from 5000 curves across 10 independent sets.ments, the variance of the top controls is comparable to that of the curve.For mass spectroscopy experiments, the variance of the curve is often higher than the top control.In both experiments, there is no correlation between the estimated variance of the top control and the curve (data not shown).
Differences in variance can be explained by experimental settings, which have been described elsewhere. 20Bottom controls have buffer but no enzyme and are subject to less variation due to concentration errors that may occur, for example, due to pipetting error or enzyme degradation.In the  fluorescence measurements, the top controls and curves have comparable sources of error.In the mass spectroscopy experiments, the top control contains only DMSO (no inhibitor) and does not have error due to the inhibitor concentration.
The 4PL+C Procedure Reduces the Estimated ASE of Experimental CRCs.For the vast majority of the fluorescence and mass spectroscopy CRC data collected for the COVID Moonshot, 4PL+C yields a smaller asymptotic standard error (ASE) than 4PL.Examples of fitting curves using 4PL+C and 4PL can be found in Supporting Information 2. We fit 1589 of 1668 fluorescence curves and 205 of 238 mass spectrometry curves with a high coefficient of determination, R 2 > 50%, using both procedures.In over 95% of the curves of both types, the ASE of R t and R b was lower using 4PL+C than 4PL (Figures 4  and 5).The 4PL+C procedure provides a lower ASE than 4PL in over 75% of the x 50 and H estimates.The reduction in ASE suggests that the 4PL+C determines parameters more accurately than 4PL.
Outlier Detection and Curve Refitting Can Improve the Precision of Parameter Estimates in Both 4PL and 4PL+C.For simulated CRCs, outlier detection and curve refitting did not improve the precision of the parameter estimates.Lenth's method 21 detected an outlier in about 40% of curves.However, removing the outliers and refitting had a negligible effect on the MSE of estimated parameters, yielding results identical with those in Table 1.The normally distributed error in the simulated curves does not lead to significant perturbations of parameter estimates.
For data from both the fluorescence and mass spectroscopy assays, removing outliers using Lenth's method 21 reduces the ASE for the vast majority of estimated parameters.Of the 1668 fluorescence assay curves, 590 had at least one outlier.To show the effect of outlier removal on the precision of each parameter, we plotted a histogram of the difference in the ASE before versus after outlier removal.For the majority of these curves, removing the outlier(s) and refitting the data led to a lower ASE.(Figure 6).Of the 238 mass spectroscopy curves, 94 had at least one outlier.For most of these, removing the outlier(s) and refitting led to lower ASE (Figure 7).Similar results were observed when refitting with the standard procedure (Figure S4 in the Supporting Information 1).

■ CONCLUSIONS
As demonstrated by analyses of simulation and experimental data, the proposed 4PL+C procedure results in more accurate estimates of Hill equation parameters, including the halfmaximal concentration and Hill slope, than the established regression methods.Benefits are especially evident in the analysis of incomplete curves.Moreover, we have shown that an outlier detection and curve refitting leads to a lower ASE of the pIC 50 and Hill slope for both 4PL+C and 4PL.
The proposed method provides clear benefits without a significant cost.No more data need to be collected than is required for standard statistical approaches.Implementation is only minimally more complex.Computational expense increases are not prohibitive.While most curve fitting software can perform a variant of 4PL+C in which all control and curve data are assumed to have the same variance, this appears to be, at least with the COVID Moonshot data, a bad assumption.Thus, we recommend that the described 4PL+C procedure, with different variances for the curve and top and bottom controls, be implemented by data analysis software developers.We recommend that it be applied by scientists as a new standard, replacing 4PL.■ EXPERIMENTAL SECTION Statistical Approaches.Maximum Likelihood Estimation without Control Data.Our procedure is based on the maximum likelihood estimation of parameters that fit the Hill equation to CRC data.We use c to denote the experimental data on the curve { } = = x r , i i j j n i m , 1,..., , 1,..., i .Here x i 's are base 10 logarithms of concentrations of the compound where response measurements are recorded, and we assume that there are m unique x i values.At each x i , there are n i repeated measurements of response r i,j for j = 1, ..., n i .Therefore, there are N c =∑ i = 1 m n i observations that are part of the concentration−response curve.The most common practice is to use the same number of replicates at each x i , such that n i = n for i = 1, ..., m and N c = m × n.
To obtain the likelihood function of the CRC, we assume that the data follow the classical Hill equation in eq 1 with an additive measurement error: The measurement error ϵ i,j independently and identically follows the normal distribution (0, ) Unknown parameters include θ = (R t , R b , x 50 , H) in the Hill equation and the nuisance parameter σ c 2 .Based on these model assumptions, the loglikelihood of the CRC data is The maximum likelihood estimator (MLE) is the solution of the following maximization problem: Note that for the parameters that maximize the likelihood, the gradient of the log likelihood with respect to ( , ) 2 is zero.This condition may be expressed in two equations: , where . Due to the structure of log c , the two equations can be solved separately.Since the Hill equation is nonlinear in θ, it is suitable to use a nonlinear equation solver to solve the first equation; this is the typical nonlinear r e g r e s s i o n a p p r o a c h .T h u s , t h e M L E o f c i s . However, the MLE of σ c 2 is actually biased.We instead use an unbiased estimator of σ c 2 : where p = 4 is the number of parameters in θ.Maximum Likelihood Estimation Using Control Data.In many cases, it is reasonable to use control data as measurements of extreme responses.Consider an enzyme inhibition assay.For a negative control without any inhibitor, because it is the base 10 logarithm of the concentration, x is undefined, and the data technically are not part of the curve.However, it is reasonable to treat the negative control data as corresponding to an extremely low concentration such that x → −∞.For a positive control with substrate(s) and cofactor(s) but no enzyme, the response should be the same as if the enzyme was present but fully inhibited by an extremely high concentration of inhibitor, such that x → ∞.
We obtain the likelihood of the control data based on the assumptions that they correspond to extreme responses and that measurement error has a similar structure to the curve.The bottom control data b include the observations r b,j for j = 1, ..., N b and are considered as measurements in which x is extremely large, such that x → ∞.In this limit, the Hill equation yields R b and measured responses are r b,j = R b + ϵ b,j , where ϵ b,j 's are the measurement errors.Because the controls may be measurements of solutions containing substances different from the curve, we cautiously assume that the measurement errors follow different normal distributions for the control and curve data.Specifically, ϵ b,j 's independently and identically follow the normal distribution (0, ) b 2 , where σ b 2 is a constant, the variance of the measurement error.Like σ c 2 , σ b 2 is an unknown nuisance parameter.At the other extreme, the top control data t include the observations r t,j for j = 1, ..., N t and correspond to x → −∞.Based on the Hill equation in this limit, r t,j = R t + ϵ t,j , where (0, ) t j t , 2 and are independently and identically distributed.Based on these assumptions, the loglikelihood of a bottom controls is, and of the top controls is The log-likelihood of curve and control data is simply the sum of the log-likelihood of each group of data: The MLE of the parameters based on both the curve and control data is the solution of = ( , , , ) arg max log , , , The two estimators c and ctb are different as the former is only based on the data set D c , whereas the latter is based on D c , D t , and D b .Moreover, the estimators maximize different likelihood functions (eq 4 versus eq 8).
Solving eq 8 is slightly more complicated than solving eq 4. Maximizing log ctb is equivalent to minimizing R(θ,σ c 2 ,σ t 2 ,σ b 2 ), where 2 ) values using the latest solution of θ.Iteration are stopped when some convergence conditions are achieved: when the changes in R function values and parameter values are less than a prespecified tolerance value.Denoting ctb as the optimal solution for θ, the estimates for the three variances are Compared to the fitting without control data, it is more complicated to adjust the estimates to correct the bias for ctb .For larger sample sizes, such as in this study, the bias is not substantial.
Inference of Standard Error.Next, we discuss the asymptotic standard error of the two MLEs.In general, the asymptotic distribution of an MLE θ ̂is normal with zero mean and covariance given by the inverse of the Fisher information matrix, i.e., where N and θ are the generic notation for sample size and parameters, and θ 0 is the vector of the true parameter values.The matrix I(θ) is the Fisher information matrix evaluated at θ and defined as Ä Since θ 0 is unknown, I(θ 0 ) is usually approximated by the sample mean, where L i is the likelihood of the ith observation.Consequently, the asymptotic covariance of θ ̂is The ASE of each parameter is simply the diagonal of the asymptotic covariance matrix.Therefore, a confidence interval of (1 − α) × 100% is given by ± z ASE( ) /2 , where z α/2 is the number of standard deviations (z-score) for the given value of α/2 such that the probability of a variable being contained in the interval is Pr To infer the ASE for the MLE based on data excluding controls (eq 4), we used eq 3 in the general ASE expression (eq 15).This leads to, The gradient and Hessian of Q are given by To infer the ASE for the MLE based on data including controls (eq 8), we use eq 7 in the general ASE expression.In this case, the covariance of θ ctb is approximately, where N total = N c + N t + N b is the total number of observations from all the data sets.The gradient and Hessian of R are given in Appendix S2 of Supporting Information 1.We plug in the estimates eqs 10, 11, and 12 to compute cov( ) ctb .Normalization.Thus, far in the Statistical Approaches, we have not discussed the effect of normalization.It is a common practice in the analysis of the CRC experiments to normalize the data prior to fitting the Hill equation model.Normalization helps "accommodate minor changes to assay protocols to ensure robustness", a key goal of the AGM. 2 Minor changes in the assay protocols could include variations in the amount of time between the initiation of the enzymatic activity and measurement of the product concentration.Here we address the question of how normalization affects the MLE estimators c and ctb .
Normalization is a linear transformation of the original data r i,j °for j = 1, ..., n i and i = 1, ..., m, Based on the same model as in eq 2, the original data are denoted by r i,j °and °+ °r r x ( , ) where r • (x, θ) is, .Thus, if we use the original data {r i,j °,x i } for j = 1, ..., n i and i = 1, ..., m, the corresponding parameters are θ°=(R b °, R t °, x 50 , H) with R b °and R t °defined above and x 50 and H remaining the same as in the normalized formulation.
The procedure for obtaining the MLE is exactly the same as for normalized data (see the sections entitled Maximum Likelihood Estimation without Control Data and Maximum Likelihood Estimation Using Control Data) except the data are replaced with original data and the parameters are changed to θ • .The log-likelihood of the original data (excluding control data) is  , and R b °and R t °are previously defined.Therefore, the models for the normalized control data follow the same model as the unnormalized ones as long as the parameters are transformed accordingly.Using the same logic as for c , we can directly conclude that the MLE °ctb using the original data have the same estimates for x 50 and H as using the normalized data.For R b

Figure 1 .
Figure 1.Example fits to CRC data with (blue line) or without (orange line) controls.Curve data are shown with black circles and control data as green triangles.The x position of the control data is not meaningful; data on the left correspond to x → −∞ and data on the right correspond to x → ∞.The vertical lines indicate the true x 50 (black dotted) or the estimated x 50 from CRC fitting with (blue dashed) or without (orange dashed) controls.

Figure 2 .
Figure 2. Example fits to incomplete data with (blue line) and without (orange line) controls.Curve data are shown with black circles and control data as green triangles.In the analysis of the incomplete curves, the points represented by red crosses are removed.The x position of the control data is not meaningful; data on the left correspond to x → −∞ and on the right correspond to x → ∞.The vertical lines indicate the true x 50 (black dotted) or estimated x 50 from CRC fitting with (blue dashed) or without (orange dashed) controls.

Figure 3 .
Figure 3.Estimated standard deviations of the controls and of the curves.Histogram of estimated standard deviations of the bottom controls b (green dashed dotted line), the top controls t (blue dashed line), and the curve c (red line) for (a) fluorescence and (b) mass spectroscopy experiments.

Figure 5 .
Figure 5.Comparison of ASE estimates for 4PL+C and 4PL analysis of mass spectroscopy CRC from the COVID Moonshot.For each parameter, the fraction of data above the diagonal is R b (100.00%),R t (96.57%), x 50 (76.96%),and H (88.24%).

Figure 6 .
Figure 6.Histogram of the change in ASE after outlier detection and refitting using the 4PL+C procedure for data from the fluorescence assay.In most parameter estimates (R b : 86.86%, R t : 97.44%, x 50 : 98.46%, H: 94.37%), the ASE is reduced (blue), but in some, the ASE increases (orange).

Figure 7 .
Figure7.Histogram of the change in ASE after outlier detection and refitting using the 4PL+C procedure for data from the mass spectroscopy assay.In most parameter estimates (R b : 78.72%, R t : 96.81%, x 50 : 94.68%, H: 89.36%), the ASE is reduced (blue), but in some the ASE increases (orange).

2 . 2 ,σ t 2 ,σ b 2 )
Compared to the MLE without control data, the equations for θ and (σ c 2 ,σ t 2 ,σ b 2 ) are more intertwined.Therefore, we iteratively solve these equations in terms of θ for fixed (σ c values and then update (σ c 2 ,σ t Detailed formulas for gradient and Hessian of r(x i , θ) are given in Appendix S1 of Supporting Information 1.To compute cov( ) c , we can plug in either the MLE or unbiased estimator for c 2 .
to yield a percentage response.As explained in the chapter "Data Standardization for Results Management" in the AGM, in most assays, the bottom and top of the curve are based on the controls.Usually, of the two sides' control data.It is also possible to use their medians, which are less sensitive to outliers.
the normalized measurement error, the original measurement error ϵ i,j °is scaled, , the variance of ϵ i,j °is °= °°( ) to see that the MLE for θ • , denoted by °c , has the same relationship with c , as θ′ with θ, and the MLE for x 50 and H remains the same whether the data are normalized or not.If the control data are included in the estimation, would the normalization affect the estimator ctb ?No, normalization does not affect the estimator ctb either.Note the relationships:

Table 1 .
Comparison of the Error in Parameter Estimates from Simulated CRCs a

Table 2 .
Comparison of Error in Parameter Estimates from Simulated Incomplete CRCs a