|
4.
Process Modeling
4.6. Case Studies in Process Modeling 4.6.2. Alaska Pipeline
|
|||
| Tranformations |
In regression modeling, we often apply transformations
to achieve the following two goals:
|
||
| Plot of Common Transformations to Obtain Homogeneous Variances |
The first step is to try transformations of the
response variable that will result in homogeneous
variances. In practice, the square root, log, and
reciprocal transformations often work well for
this purpose. We will try these first.
In examining these plots, we are looking for the plot that shows the most constant variability across the horizontal rqnge of the plot. This plot indicates that the log transformation is a good candidate model for achieving the most homogeneous residuals. |
||
| Plot of Common Transformations to Linearize the Fit |
One problem with applying the above transformation is that
the plot indicates that a straight line fit will no longer
be an adequate model for the data. We address this problem
by attempting to find a transformation of the
predictor variable that will result in the most
linear fit. In practice, the square root, log, and
reciprocal transformations often work well for
this purpose. We will try these first.
This plot shows that the log transformation of the predictor variable is a good candidate model. |
||
| Box-Cox Linearity Plot |
The previous step can be approached more formally
by the use of the
Box-Cox
linearity plot. The x value corresponding to
the maximum y value on the plot indicates the
power transformation that yields the most linear
fit.
This plot indicates that a value of -0.1 achieves the most linear fit. In practice, for ease of interpretation, we often prefer to use a common transformation, such as the log or square root, rather than the value that yields the mathematical maximum. However, the Box-Cox linearity plot still indicates whether our choice is a reasonable one. That is, we might sacrifice a small amount of linearity in the fit to have a simpler model. In this case, a value of 0.0 would indicate a log transformation. Although the optimal value from the plot is -0.1, the plot indicates that any value between -0.2 and 0.2 will yield fairly similar results. For that reason, we choose to stick with the common log transformation. |
||
| Log-Log Fit |
Based on the above plots, we choose to fit a log-log model.
Dataplot generated the following output for this model
(it is edited slightly for display).
LEAST SQUARES MULTILINEAR FIT
SAMPLE SIZE N = 107
NUMBER OF VARIABLES = 1
REPLICATION CASE
REPLICATION STANDARD DEVIATION = 0.1369758099D+00
REPLICATION DEGREES OF FREEDOM = 29
NUMBER OF DISTINCT SUBSETS = 78
PARAMETER ESTIMATES (APPROX. ST. DEV.) T VALUE
1 A0 0.281384 (0.8093E-01) 3.5
2 A1 XTEMP 0.885175 (0.2302E-01) 38.
RESIDUAL STANDARD DEVIATION = 0.1682604253
RESIDUAL DEGREES OF FREEDOM = 105
REPLICATION STANDARD DEVIATION = 0.1369758099
REPLICATION DEGREES OF FREEDOM = 29
LACK OF FIT F RATIO = 1.7032 = THE 94.4923% POINT OF THE
F DISTRIBUTION WITH 76 AND 29 DEGREES OF FREEDOM
Note that although the residual standard deviation is significantly
lower than it was for the original fit, we cannot compare them
directly since the fits were performed on different scales.
|
||
| Plot of Predicted Values |
The plot of the predicted values with the transformed data indicates a good fit. In addition, the variability of the data across the horizontal range of the plot seems relatively constant. |
||
| 6-Plot of Fit |
Since we transformed the data, we need to validate that
all of the regression assumptions are still satisfied.
The 6-plot of the residuals indicates that all of the regression assumptions are now satisfied. |
||
| Plot of Residuals |
In order to see more detail, we generate a full size version of the residuals versus predictor variable plot. This plot clearly shows that the residuals now satisfy the assumption of homogeneous variances. |
||