Next Page Previous Page Handbook Home Tools & Aids Search Handbook
4. Process Modeling
4.6. Case Studies in Process Modeling
4.6.2. Alaska Pipeline

4.6.2.2.

Check for Batch Effect

Plot of Raw Data As with any regression problem, it is always a good idea to plot the raw data first. The following is a scatter plot of the raw data.

scatter plot indicates a linear fit might be appropriate

This scatter plot shows that a straight line fit is a good initial candidate model for this data.

Plot by Batch This data was collected in six distinct batches. The first step in the analysis is to determine if there is a batch effect.

In this case, the scientist was not inherently interested in the batch. That is, batch is a nuisance factor and, if reasonable, we would like to analyze the data as if it came from a single batch. However, we need ensure that this is, in fact, a reasonable assumption to make.

Conditional Plot We first generate a conditional plot where we condition on the batch.

conditional plot (on batch) does not show a
               batch effect

This conditional plot shows a scatter plot for each of the six batches on a single page. Each of these plots shows a similar pattern.

Linear Correlation and Related Plots We can follow up the conditional plot with a linear correlation plot, a linear intercept plot, a linear slope plot, and a linear residual standard deviation plot. These four plots show the correlation, the intercept and slope from a linear fit, and the residual standard deviation of linear fits applied to each batch. These plots show how a linear fit performs across the six batches.

linear correlation, intercept, slope and ressd plots
               do not show any significant batch effect

The linear correlation plot shows that batch six has a somewhat higher correlation than the others (this is also reflected in the significantly lower residuals standard deviation for batch six). The slopes all lie within the 0.6 to 0.9 range and the intercepts all lie between 2 and 8.

Treat BATCH as Homogeneous These summary plots, in conjunction with the conditional plot above, show that treating the data as a single batch is a reasonable assumption to make. None of the batches behaves badly compared to the others and none of the batches requires a significantly different fit than the others.

These two plots provide a good pair. The plot of the fit statistics allows quick and convenient comparisons of the overall fits. However, the conditional plot can reveal details that may be hidden in the summary plots. For example, we can more readily determine the existence of clusters of points and outliers, curvature in the data, and other similar features.

Based on these plots we will ignore the BATCH variable for the remaining analysis.

Handbook Home Tools & Aids Search Handbook Previous Page Next Page