|
5.
Process Improvement
5.4. Analysis of DOE data 5.4.7. Examples of DOE's
|
|||
| This example
uses data from a NIST high performance ceramics experiment
The reader may want to download the data as a text file and try using other software packages |
Data Source
This data set was taken from an experiment that was performed a few years ago at NIST (by Said Jahanmir of the Ceramics Division in the Material Science & Engineering Laboratory). The original analysis was done primarily by Lisa Gill of the Statistical Engineering Division. The example shown here is an independent analysis of a modified portion of the original data set. The original data set was part of a high performance ceramics experiment aimed at characterizing the effect of grinding parameters on sintered reaction bonded silicon nitride, reaction bonded silicone nitride, and sintered silicon nitride. Only modified data from the first (sintered reaction bonded silicon nitride) of the 3 ceramic types will be discussed in this illustrative example of a full factorial data analysis. Description of Experiment: Response and Factors Purpose: To determine the effect of machining factors on ceramic strength
Response Variable Y = Mean (over 15 reps) of Ceramic StrengthSince two factors were qualitative (direction and batch) and it was reasonable to expect monotone effects from the quantitative factors, no center point runs were included. The design matrix, with measured ceramic strength responses, appears below. The actual randomized run order is given in the last column. (The interested reader may download the data as a text file or a JMP file.)
|
||
| Analysis follows the previously described 5 basic steps | Analysis of the Experiment
The experimental data will be analyzed using SAS JMP 3.2.6 software. Step 1: Look at the data We start by plotting the response data several ways to see if any trends or anomalies appear that would not be accounted for by the standard linear response models. First we look at the distribution of all the responses irrespective
of factor levels.
Clearly there is "structure" that we hope to account for when we fit a response model. For example, note the separation of the response into two roughly equal sized clumps. Next we look at the responses plotted versus run order to check whether there might be a time sequence component affecting the response levels. Plot of Response Vs. Run Order
As hoped for, this plot does not indicate that time order had much to do with the response levels. Next, we look at plots of the responses sorted by factor columns.
Several factors, most notably "Direction" followed by "Batch" and possibly
"Wheel Grit", appear to change the average response level.
Step 2: Create the theoretical model With a 25 full factorial experiment we can fit a model containing a mean term, all 5 main effects terms, all 10 second order interaction terms, all 10 third order interaction terms, all 5 fourth order interaction terms and the fifth order interaction term (32 parameters). However, we start by assuming all fourth order and higher interaction terms are non-existent (it's very rare for such high order interactions to be significant, and they are very difficult to interpret from an engineering viewpoint). That allows us to accumulate the sums of squares for these terms and use them to estimate an error term. So we start out with a theoretical model with 26 unknown constants, hoping the data will clarify which of these are the significant main effects and interactions we need for a final model. Step 3: Create the actual model from the data After fitting the 26 parameter model, the following analysis table is displayed: Output after Fitting Third Order Model to Response Data Response: Y: Strength Summary of Fit
Effect Test X1: Table Speed 1 894.33 2.8175 0.1442 X2: Feed Rate 1 3497.20 11.0175 0.0160 X1: Table Speed*X2: Feed Rate 1 4872.57 15.3505 0.0078 X3: Wheel Grit 1 12663.96 39.8964 0.0007 X1: Table Speed*X3: Wheel Grit 1 1838.76 5.7928 0.0528 X2: Feed Rate*X3: Wheel Grit 1 307.46 0.9686 0.3630 X1:Table Speed*X2: Feed Rate*X3: Wheel Grit 1 357.05 1.1248 0.3297 X4: Direction 1 315132.65 992.7901 <.0001 X1: Table Speed*X4: Direction 1 1637.21 5.1578 0.0636 X2: Feed Rate*X4: Direction 1 1972.71 6.2148 0.0470 X1: Table Speed*X2: Feed Rate*X4: Direction 1 5895.62 18.5735 0.0050 X3: Wheel Grit*X4: Direction 1 3158.34 9.9500 0.0197 X1: Table Speed*X3: Wheel Grit*X4: Direction 1 2.12 0.0067 0.9376 X2: Feed Rate*X3: Wheel Grit*X4: Direction 1 44.49 0.1401 0.7210 X5: Batch 1 33653.91 106.0229 <.0001 X1: Table Speed*X5: Batch 1 465.05 1.4651 0.2716 X2: Feed Rate*X5: Batch 1 199.15 0.6274 0.4585 X1: Table Speed*X2: Feed Rate*X5: Batch 1 144.71 0.4559 0.5247 X3: Wheel Grit*X5: Batch 1 29.36 0.0925 0.7713 X1: Table Speed*X3: Wheel Grit*X5: Batch 1 30.36 0.0957 0.7676 X2: Feed Rate*X3: Wheel Grit*X5: Batch 1 25.58 0.0806 0.7860 X4: Direction *X5: Batch 1 1328.83 4.1863 0.0867 X1: Table Speed*X4: Direction *X5: Batch 1 544.58 1.7156 0.2382 X2: Feed Rate*X4: Direction *X5: Batch 1 167.31 0.5271 0.4952 X3: Wheel Grit*X4: Direction *X5: Batch 1 32.46 0.1023 0.7600 This fit has a high R squared and adjusted R squared, but the large number of high (>0.10) p-values (in the "Prob>F" column) make it clear that the model has many unnecessary parameters. Starting with these 26 parameters, we next use the JMP Stepwise Regression option to eliminate unnecessary parameters. By a combination of stepwise regression and the removal of remaining terms with a p-value higher than 0.05, we quickly arrive at a model with a mean term and 12 significant effect terms. Output after Fitting 12 Effect Model to Response Data Response: Y: Strength
Effect Test Source DF Sum of Squares F Ratio Prob>F X2: Feed Rate 1 3497.20 15.6191 0.0009 X1: Table Speed*X2: Feed Rate 1 4872.57 21.7618 0.0002 X3: Wheel Grit 1 12663.96 56.5595 <.0001 X1: Table Speed*X3: Wheel Grit 1 1838.76 8.2122 0.0099 X4: Direction 1 315132.65 1407.4390 <.0001 X1: Table Speed*X4: Direction 1 1637.21 7.3121 .0141 X2: Feed Rate*X4: Direction 1 1972.71 8.8105 0.0079 X1: Table Speed*X2: Feed Rate*X4:Direction 1 5895.62 26.3309 <.0001 X3: Wheel Grit*X4: Direction 1 3158.34 14.1057 0.0013 X5: Batch 1 33653.91 150.3044 <.0001 X4: Direction *X5: Batch 1 1328.83 5.9348 0.0249 Note that we would have arrived at the exact same 12 terms by looking
at a normal plot of the full (saturated) model with all 31 effects plotted.
This plot is shown below:
Most of the effects points cluster close to the center (zero) line and follow the fitted normal model straight line. The effects that appear to be above or below the line are the same effects identified using the stepwise routine, with the exception of X1 (which has to be included in the model since several interaction effects involving X1 appear significant). At this stage, this model appears to account for most of the variability in the response, achieving an adjusted R squared of 0.982. All the main effects are significant, as are 6 second order interactions and 1 third order interaction. The only interaction that makes little physical sense is the " X4: Direction *X5: Batch" interaction - why would one batch of material react differently when cut in a different direction as compared to an other batch of the same formulation? However, before accepting any model, residuals need to be examined. Step 4: Test the model assumptions using residual graphs (adjust and simplify as needed) First we look at the residuals plotted versus the predicted responses.
The residuals appear to spread out more with larger values of predicted strength, which should not happen with common variance. Next we examine the normality of the residuals with a normal quantile
plot, a box plot and a histogram.
None of these plots appear to show typical normal residuals and 4 out of the 32 data points show up as outliers in the box plot. Step 4 continued: Transform the data and fit the model again We next look at whether we can model a transformation of the response
variable and obtain better behaving residuals. JMP calculates an optimum
Box-Cox transformation by finding the value of lambda that minimizes the
model SSE. Note: the Box-Cox transformation used in JMP is different than
the transformation used in Dataplot,
but roughly equivalent.
Box-Cox Transformation Graph
The optimum is found at lambda = .2 and a new column Y: Strength X is calculated and added to the JMP data spreadsheet. The properties of this column, showing the transformation equation, are shown below. Data Transformation Column Properties
When the 12 effect model is fit to the transformed data, the "X4: Direction *X5: Batch" interaction term is no longer significant. The 11 effect model fit is shown below, with parameter estimates and p-values. Output after Fitting 11 Effect Model to Transformed Response Data Response: Y: Strength X
Effect
Parameter Estimate p-value
This model has a very high R squared and adjusted R squared. The residual
plots (shown below) are quite a bit better behaved than before, and pass
the Wilks-Shapiro test for normality.
Step 5. Answer the questions in your experimental objectives The magnitudes of the model parameters show that "Direction" is by far the most important factor. "Batch" plays the next most critical role, followed by "Wheel Grit". Then, there are several important interactions followed by "Feed Rate". "Table Speed" plays a role in almost every significant interaction term, but is the least important main effect on its own. Plots of the main effects and the significant 2-way interactions are
shown below.
To determine the best setting to use for maximum ceramic strength, JMP has the "Prediction Profile" option shown below. Y: Strength X
The vertical lines indicate the optimal factor settings to maximize the (transformed) strength response. Translating from -1 and +1 back to the actual factor settings we have: Table speed at "1" or .125m/s; Down Feed Rate at "1" or .125 mm; Wheel Grit at "-1" or 140/170 and Direction at "-1" or longitudinal. Unfortunately, "Batch" is also a very significant factor, with the first batch giving higher strengths than the second. Unless it is possible to learn what worked well with this batch, and how to repeat it, not much can be done about this factor. Comments 1. One might ask what an analysis of just the 24 factorial with "Direction" kept to -1 (i.e. longitudinal) would yield. This analysis turns out to have a very simple model; only "Wheel Grit" and "Batch" are significant main effects and no interactions are significant. If, on the other hand, we do an analysis of the 24 factorial with "Direction" kept to +1 (i.e. transverse), then we get a 7 parameter model with all the main effects and interactions we saw in the 25 analysis, except, of course, any terms involving "Direction". So it appears that the complex model of the full analysis came from the physical properties of a transverse cut, and these complexities are not present for longitudinal cuts. 2. If we had assumed that three factor and higher interactions were negligible before experimenting, a 2V5-1 half fraction design might have been chosen. In hindsight, we would have gotten valid estimates for all main effects and two factor interactions except for X3 and X5, which would have been aliased with X1*X2*X4 in that half fraction. 3. Finally, we note that many analysts might prefer to adopt a natural logarithm transformation (i.e. use ln Y) as the response, instead of using a Box-Cox transformation with an exponent of .2. The natural logarithm transformation corresponds to an exponent of lamda = 0 in the Box-Cox graph. |
||