Next Page Previous Page Handbook Home Tools & Aids Search Handbook
4. Process Modeling
4.6. Case Studies in Process Modeling
4.6.2. Alaska Pipeline

4.6.2.3.

Initial Linear Fit

Linear Fit Output Based on the initial plot of the data, we first fit a straight line model to the data.

The following fit output was generated by Dataplot (it has been edited slightly for display).

  
 LEAST SQUARES MULTILINEAR FIT
 SAMPLE SIZE N       =      107
 NUMBER OF VARIABLES =        1
 REPLICATION CASE
 REPLICATION STANDARD DEVIATION =     0.6112687111D+01
 REPLICATION DEGREES OF FREEDOM =          29
 NUMBER OF DISTINCT SUBSETS     =          78
  
  
       PARAMETER ESTIMATES           (APPROX. ST. DEV.)    T VALUE
1  A0                   4.99368       ( 1.126    )          4.4
2  A1       LAB        0.731111       (0.2455E-01)          30.
  
RESIDUAL    STANDARD DEVIATION =         6.0809240341
RESIDUAL    DEGREES OF FREEDOM =         105
REPLICATION STANDARD DEVIATION =         6.1126871109
REPLICATION DEGREES OF FREEDOM =          29
LACK OF FIT F RATIO =       0.9857
  = THE  46.3056% POINT OF THE
F DISTRIBUTION WITH     76 AND     29 DEGREES OF FREEDOM

      
The intercept parameter is estimated to be 4.99 and the slope parameter is estimated to be 0.73. Both parameters are statistically significant.
6-Plot for Model Validation When there is a single independent variable, the 6-plot provides a convenient method for initial model validation.

6-plot shows 6 different model validation plots

The basic assumptions for regression models are that the residuals are random observations from a common distribution with constant mean and constant standard deviation (or variance).

The plots on the first row show that the residuals have increasing variance as the value of the independent variable (lab) increases in value. This indicates that the assumption of constant standard deviation, or homogeneity of variances, is violated.

In order to see this more clearly, we will generate full size plots of the predicted values with the data and the residuals against the independent variable.

Plot of Predicted Values with Original Data plot of predicted values with raw data indicates problem with homogeneous variance assumption

This plot shows more clearly that the assumption of homogeneous variances for the residuals may be violated.

Plot of Residual Values Against Independent Variable plot of residuals versus independent variable shows non-homogeneous variances more clearly

This plot also shows more clearly that the assumption of homogeneous residuals is violated. This assumption, along with the assumption of constant location, are typically easiest to see on this plot.

Non-Homogeneous Variances Although the assumption violation is a mild violation rather than a gross violation, we can try to improve the quality of the fit by addressing the non-homogeneous variances for the residuals problem. We will use transformations and weighted fits to see if we can improve on the current model.
Handbook Home Tools & Aids Search Handbook Previous Page Next Page