Next Page Previous Page Handbook Home Tools & Aids Search Handbook
4. Process Modeling
4.4. Data Analysis for Process Modeling
4.4.3. How are estimates of the unknown parameters obtained?

4.4.3.1.

Least Sum of Squares

General LSS Criterion In least sum of squares (LSS) estimation, the unknown values of the parameters, , in the regression function, , are estimated by finding the numeric values for the parameters that minimize the sum of the squared deviations between the observed responses and the functional portion of the model. Mathematically, the least sum-of-squares criterion that is minimized to obtain the parameter estimates is


As previously noted, are treated as the variables in the optimization and the predictor variable values, are treated as coefficients. To emphasize the fact that the estimates of the parameter values are not the same as the true values of the parameters the estimates are denoted by . For linear models the least squares minimization is usually done analytically using calculus. For nonlinear models, on the other hand, the minimization must almost always be done using iterative numerical algorithms.
LSS for Straight Line To make this more concrete consider the straight-line model,

.

For this model the least squares estimates of the parameters would be computed by minimizing


Doing this by
  1. taking partial derivatives of with respect to and ,
  2. setting each partial derivative equal to zero, and
  3. solving the resulting system of two equations with two unknowns
yields the following estimates for the parameters:



.
These formulas are instructive because they show that the parameter estimates are functions of the both the predictor and response variables and that the parameter estimates are not independent of one another unless . This is clear because the formula for the estimate of the intercept depends directly on the value of the estimate for the slope, except when the second term in the formula cancels out through multiplication by zero. This means that if the estimate of the slope deviates a lot from the true slope then the estimate of the intercept will tend to deviate a lot from its true value too. This lack of independence of the parameter estimates, or more specifically the correlation of the parameter estimates, becomes important when computing the uncertainties of predicted values from the model. Although the formulas discussed in this paragraph only apply to the straight line model, the relationship between the parameters is analogous for more complicated models, including both statistically linear and statistically nonlinear models.
Quality of Least Squares Estimates From the preceding discussion, which focuses on how the least squares estimates of the model parameters are computed and on the relationship the between parameter estimates, it is difficult to picture exactly how good the parameter estimates are. They are, in fact often quite good. The plot below shows the data from the Pressure/Temperature example with the fitted regression line and the true regression line which is known in this case because this data is simulated. It is clear from the plot that the two lines, the solid one estimated by least squares and the dashed obtained from the inputs to the simulation, are almost identical over the range of the data. Because the least squares line approximates the true line so well in this case, the least squares line will serve as a useful description of the deterministic portion of the variation in the data, even though it is not a perfect description. While this plot is just one example, the relationship between the estimated and true regression functions shown here is fairly typical.
Comparison of LSS Line and True Line data from the Pressure/Temperature example with fitted and true regression lines
Quantifying the Quality of the Fit for Real Data From the plot above it is easy to see that the line based on the least squares estimates of and is a good estimate of the true line for this simulated data. For real data, of course, this type of direct comparison is not possible. Plots comparing the model to the data can also provide valuable information on the adequacy and usefulness of the model, however. In addition, another measure of the average quality of the fit of a regression function to a set of data by least squares can be quantified using the remaining parameter in the model, , the standard deviation of the probability distribution describing the random variation in the data.
Like the parameters in the functional part of the model, is generally not known, but it can also be estimated from the least squares equations. The formula for the estimate is

,

where is the number of observations in the sample and is the number of parameters in the functional part of the model.
Because measures how the individual values of the response variable vary with respect to their true values, it also contains information about how far from the truth quantities derived from the data, such as the estimated values of the parameters, could be. Knowledge of the approximate value of plus the values of the predictor variable values can be combined to provide estimates of the average deviation between the different aspects of the model and the corresponding true values, quantities that can be related to properties of the process generating the data that we would like to know.
More information on the correlation of the parameter estimates and computing uncertainties for different functions of the estimated regression parameters can be found in Section 5.
Handbook Home Tools & Aids Search Handbook Previous Page Next Page