Next Page Previous Page Handbook Home Tools & Aids Search Handbook
4. Process Modeling
4.6. Case Studies in Process Modeling
4.6.2. Alaska Pipeline

4.6.2.4.

Transformations to Improve Fit

Tranformations In regression modeling, we often apply transformations to achieve the following two goals:
  1. to satisfy the homogeneity of variances assumption for the residuals.
  2. to linearize the fit as much as possible.
Some care and judgement is required in that these two goals can conflict. We generally try to achieve homogeneous residuals first and then address the issue of trying to linearize the fit.
Plot of Common Transformations to Obtain Homogeneous Variances The first step is to try transformations of the response variable that will result in homogeneous variances. In practice, the square root, log, and reciprocal transformations often work well for this purpose. We will try these first.

plot of transformations indicates log transformation is best

In examining these plots, we are looking for the plot that shows the most constant variability across the horizontal rqnge of the plot.

This plot indicates that the log transformation is a good candidate model for achieving the most homogeneous residuals.

Plot of Common Transformations to Linearize the Fit One problem with applying the above transformation is that the plot indicates that a straight line fit will no longer be an adequate model for the data. We address this problem by attempting to find a transformation of the predictor variable that will result in the most linear fit. In practice, the square root, log, and reciprocal transformations often work well for this purpose. We will try these first.

plot of transformations indicates log transformation is best

This plot shows that the log transformation of the predictor variable is a good candidate model.

Box-Cox Linearity Plot The previous step can be approached more formally by the use of the Box-Cox linearity plot. The x value corresponding to the maximum y value on the plot indicates the power transformation that yields the most linear fit.

Box-Cox plot shows a value of approximately -0.1 achieves the most linear fit

This plot indicates that a value of -0.1 achieves the most linear fit.

In practice, for ease of interpretation, we often prefer to use a common transformation, such as the log or square root, rather than the value that yields the mathematical maximum. However, the Box-Cox linearity plot still indicates whether our choice is a reasonable one. That is, we might sacrifice a small amount of linearity in the fit to have a simpler model.

In this case, a value of 0.0 would indicate a log transformation. Although the optimal value from the plot is -0.1, the plot indicates that any value between -0.2 and 0.2 will yield fairly similar results. For that reason, we choose to stick with the common log transformation.

Log-Log Fit Based on the above plots, we choose to fit a log-log model. Dataplot generated the following output for this model (it is edited slightly for display).
  
LEAST SQUARES MULTILINEAR FIT
SAMPLE SIZE N       =      107
NUMBER OF VARIABLES =        1
REPLICATION CASE
REPLICATION STANDARD DEVIATION =     0.1369758099D+00
REPLICATION DEGREES OF FREEDOM =          29
NUMBER OF DISTINCT SUBSETS     =          78
  
  
       PARAMETER ESTIMATES           (APPROX. ST. DEV.)    T VALUE
1  A0                  0.281384       (0.8093E-01)          3.5
2  A1       XTEMP      0.885175       (0.2302E-01)          38.
  
RESIDUAL    STANDARD DEVIATION =         0.1682604253
RESIDUAL    DEGREES OF FREEDOM =         105
REPLICATION STANDARD DEVIATION =         0.1369758099
REPLICATION DEGREES OF FREEDOM =          29
LACK OF FIT F RATIO =       1.7032 = THE  94.4923% POINT OF THE
F DISTRIBUTION WITH     76 AND     29 DEGREES OF FREEDOM

      
Note that although the residual standard deviation is significantly lower than it was for the original fit, we cannot compare them directly since the fits were performed on different scales.
Plot of Predicted Values

plot of predicted values with raw data

The plot of the predicted values with the transformed data indicates a good fit. In addition, the variability of the data across the horizontal range of the plot seems relatively constant.

6-Plot of Fit 6-plot indicates regression assumptions satisfied Since we transformed the data, we need to validate that all of the regression assumptions are still satisfied.

The 6-plot of the residuals indicates that all of the regression assumptions are now satisfied.

Plot of Residuals plot of residuals versus predictor variable shows homogeneous variances for residuals

In order to see more detail, we generate a full size version of the residuals versus predictor variable plot. This plot clearly shows that the residuals now satisfy the assumption of homogeneous variances.

Handbook Home Tools & Aids Search Handbook Previous Page Next Page