|
4.
Process Modeling
4.6. Case Studies in Process Modeling 4.6.2. Alaska Pipeline
|
|||
| Weighting | Another approach when the assumption of constant standard deviation of the residuals (or homogeneous residuals) is violated is to perform a weighted fit. That is, we give less weight to the less precise measurements. | ||
| Finding An Appropriate Weight Function |
The obvious question is: how do we determine an
appropriate weighting function?
Weighted least squares estimates are found by minimizing:
|
||
| Replication in the Data |
The obvious way to estimate the
,
if there are replicates in the data, is
However, this rarely works well because the weights are extremely variable when estimated this way. |
||
| An Improved Strategy |
A better strategy for estimating the weights is
to find a function that relates
to
.
If
One model, called the power model, that often works well for modeling the variances is
|
||
| Estimate Weights Using Power Function |
To estimate the weights above using the power function
shown above, fit the function
Then use
You should check the residuals from the fit used to estimate c just to make sure everything looks reasonable. The fit does not have to meet the standards usually used, however. |
||
| Replicates Not Available |
If there are few or no replicates in the data, then
we can approximate the replication case as follows.
Divide the data into several ranges in which the
responses have similar means. That is, we pick the
ranges small enough so that the plot shows little
non-zero slope.
We then treat each range as replicates and compute
Then fit
|
||
| Approaches to Forming Replicate Groups |
There are several possible approaches to forming the
replicate groups.
|
||
| Weighted Residuals |
One complication with weighted analysis is the fact
that the distribution of the residuals can vary
substantially with the different values of the
predictor variable.
This necessitates the use of weighted residuals when plotting residuals. The weighted residuals are given by
|
||
| Fit for Estimating Weights |
For the pipeline data, we chose replicate groups
so that each group has four observations (the last
group only has three). This was generated by first
sorting the data by the predictor variable and then
taking four points in succession to form a
replicate group.
Dataplot generated the following output for the fit of log(variances) against log(means) for the replicate groups. The output has been edited slightly for display.
LEAST SQUARES MULTILINEAR FIT
SAMPLE SIZE N = 27
NUMBER OF VARIABLES = 1
NO REPLICATION CASE
PARAMETER ESTIMATES (APPROX. ST. DEV.) T VALUE
1 A0 -3.18451 (0.8265 ) -3.9
2 A1 XTEMP 1.69001 (0.2344 ) 7.2
RESIDUAL STANDARD DEVIATION = 0.8561206460
RESIDUAL DEGREES OF FREEDOM = 25
The fit output and plot from the replicate variances against the replicate means shows that the a linear fit provides a reasonable fit with an estimated slope of 1.69. Note that this data set has a small number of replicates, so you may get a slightly different estimate for the slope. For example, S-PLUS generated a slope estimate of 1.52. This is caused by the sorting of the predictor variable (i.e., where we have actual replicates in the data, different sorting algorithms may put some observations in different replicate groups). In practice, any choice of c in the range 1.5 to 2.0 is reasonable and should produce comparable results for the weighted fit. We used an estimate of 1.5 for c the weighting function. |
||
| Residual Plot for Weight Function |
The residual plot from the fit to determine an appropriate weighting function reveals no obvious problems. |
||
| Numerical Output from Weighted Fit |
Dataplot generated the following output for the
weighted fit (edited slightly for display).
LEAST SQUARES MULTILINEAR FIT
SAMPLE SIZE N = 107
NUMBER OF VARIABLES = 1
REPLICATION CASE
REPLICATION STANDARD DEVIATION = 0.6112687111D+01
REPLICATION DEGREES OF FREEDOM = 29
NUMBER OF DISTINCT SUBSETS = 78
PARAMETER ESTIMATES (APPROX. ST. DEV.) T VALUE
1 A0 2.35234 (0.5431 ) 4.3
2 A1 LAB 0.806363 (0.2265E-01) 36.
RESIDUAL STANDARD DEVIATION = 0.3645902574
RESIDUAL DEGREES OF FREEDOM = 105
REPLICATION STANDARD DEVIATION = 6.1126871109
REPLICATION DEGREES OF FREEDOM = 29
This output shows a slope of 0.81 and an intercept
term of 2.35. This is compared to a slope of 0.73 and
an intercept of 4.99 in
the original model.
|
||
| Plot of Predicted Values |
The plot of the predicted values with the data indicates a good fit. |
||
| 6-Plot of Fit |
We need to verify that the weighting did not result in the other regression assumptions being violated. The 6-plot indicates that the regression assumptions are satisfied. |
||
| Plot of Residuals |
In order to check the assumption of homogeneous variances for the residuals in more detail, we generate a full size version of the residuals versus the predictor variable. This plot shows that the residuals now exhibit homogeneous variances. |
||