|
4.
Process Modeling
4.6. Case Studies in Process Modeling 4.6.2. Alaska Pipeline
|
|||
| Plot of Raw Data |
As with any regression problem, it is always a good idea
to plot the raw data first. The following is a
scatter plot
of the raw data.
This scatter plot shows that a straight line fit is a good initial candidate model for this data. |
||
| Plot by Batch |
This data was collected in six distinct batches. The
first step in the analysis is to determine if there is a
batch effect.
In this case, the scientist was not inherently interested in the batch. That is, batch is a nuisance factor and, if reasonable, we would like to analyze the data as if it came from a single batch. However, we need ensure that this is, in fact, a reasonable assumption to make. |
||
| Conditional Plot |
We first generate a
conditional plot
where we condition on the batch.
This conditional plot shows a scatter plot for each of the six batches on a single page. Each of these plots shows a similar pattern. |
||
| Linear Correlation and Related Plots |
We can follow up the conditional plot with
a linear
correlation plot,
a linear
intercept plot,
a linear
slope plot, and
a linear
residual standard deviation plot.
These four plots show the correlation, the intercept and
slope from a linear fit, and the residual standard deviation
of linear fits applied to each batch. These plots show
how a linear fit performs across the six batches.
The linear correlation plot shows that batch six has a somewhat higher correlation than the others (this is also reflected in the significantly lower residuals standard deviation for batch six). The slopes all lie within the 0.6 to 0.9 range and the intercepts all lie between 2 and 8. |
||
| Treat BATCH as Homogeneous |
These summary plots, in conjunction with the conditional plot
above, show that treating the data as a single batch
is a reasonable assumption to make. None of the
batches behaves badly compared to the others and none
of the batches requires a significantly different
fit than the others.
These two plots provide a good pair. The plot of the fit statistics allows quick and convenient comparisons of the overall fits. However, the conditional plot can reveal details that may be hidden in the summary plots. For example, we can more readily determine the existence of clusters of points and outliers, curvature in the data, and other similar features. Based on these plots we will ignore the BATCH variable for the remaining analysis. |
||