Next Page Previous Page Handbook Home Tools & Aids Search Handbook
5. Process Improvement
5.4. Analysis of DOE data
5.4.7. Examples of DOE's

5.4.7.1.

Full factorial example

This example uses data from a NIST high performance ceramics experiment 
 

The reader may want to download the data as a text file and try using other software packages

Data Source

This data set was taken from an experiment that was performed a few years ago at NIST (by Said Jahanmir of the Ceramics Division in the Material Science & Engineering Laboratory). The original analysis was done primarily by Lisa Gill of the Statistical Engineering Division. The example shown here is an independent analysis of a modified portion of the original data set. 

The original data set was part of a high performance ceramics experiment aimed at characterizing the effect of grinding parameters on sintered reaction bonded silicon nitride, reaction bonded silicone nitride, and sintered silicon nitride.

Only modified data from the first (sintered reaction bonded silicon nitride) of the 3 ceramic types will be discussed in this illustrative example of a full factorial data analysis.

Description of Experiment: Response and Factors 

Purpose: To determine the effect of machining factors on ceramic strength
Response variable = mean (over 15 repetitions) of the ceramic strength
Number of observations =  32 (a complete 25 factorial design)

    Response Variable Y = Mean (over 15 reps) of Ceramic Strength
    Factor  1 = Table Speed    (2 levels: slow (.025 m/s) & fast (.125 m/s))
    Factor  2 = Down Feed Rate (2 levels: slow .05 mm & fast (.125 mm))
    Factor  3 = Wheel Grit (2 levels: 140/170 & 80/100)
    Factor  4 = Direction (2 levels: longitudinal & transverse)
    Factor  5 = Batch  (2 levels: 1 & 2)
Since two factors were qualitative (direction and batch) and it was reasonable to expect monotone effects from the quantitative factors, no center point runs were included.

The design matrix, with measured ceramic strength responses, appears below. The actual randomized run order is given in the last column. (The interested reader may download the data as a text file or a JMP file.)


 

Analysis follows the previously described  5 basic steps Analysis of the Experiment

The experimental data will be analyzed using SAS JMP 3.2.6  software.

Step 1: Look at the data

We start by plotting the response data several ways to see if any trends or anomalies appear that would not be accounted for by the standard linear response models. 

First we look at the distribution of all the responses irrespective of factor levels.
 
 






Clearly there is "structure" that we hope to account for when we fit a response model. For example, note the separation of the response into two roughly equal sized clumps.

Next we look at the responses plotted versus run order to check whether there might be a time sequence component affecting the response levels.

Plot of Response Vs. Run Order

As hoped for, this plot does not indicate that time order had much to do with the response levels.

Next, we look at plots of the responses sorted by factor columns. 


 


Several factors, most notably "Direction" followed by "Batch" and possibly "Wheel Grit", appear to change the average response level. 
 

Step 2: Create the theoretical model

With a 25 full factorial experiment we can fit a model containing a mean term, all 5 main effects terms, all 10 second order interaction terms, all 10 third order interaction terms, all 5 fourth order interaction terms and the fifth order interaction term (32 parameters). However, we start by assuming all fourth order and higher interaction terms are non-existent (it's very rare for such high order interactions to be significant, and they are very difficult to interpret from an engineering viewpoint). That allows us to accumulate the sums of squares for these terms and use them to estimate an error term. So we start out with a theoretical model with 26 unknown constants, hoping the data will clarify which of these are the significant main effects and interactions we need for a final model.

Step 3: Create the actual model from the data

After fitting the 26 parameter model, the following analysis table is displayed: 

Output after Fitting Third Order Model to Response Data

Response:     Y: Strength

Summary of Fit
RSquare         0.995127
RSquare Adj  0.974821
Root Mean Square Error     17.81632
Mean of Response            546.8959
Observations  32

Effect Test

     Source                                                      DF      Sum of Squares  F Ratio       Prob>F
X1: Table Speed                                                 1             894.33            2.8175        0.1442
X2: Feed Rate                                                    1           3497.20          11.0175        0.0160
X1: Table Speed*X2: Feed Rate                          1           4872.57          15.3505        0.0078
X3: Wheel Grit                                                   1         12663.96          39.8964        0.0007
X1: Table Speed*X3: Wheel Grit                         1           1838.76            5.7928        0.0528
X2: Feed Rate*X3: Wheel Grit                            1            307.46             0.9686       0.3630 X1:Table Speed*X2: Feed Rate*X3: Wheel Grit   1            357.05             1.1248        0.3297
X4: Direction                                                      1      315132.65         992.7901        <.0001
X1: Table Speed*X4: Direction                            1          1637.21             5.1578        0.0636
X2: Feed Rate*X4: Direction                               1          1972.71             6.2148        0.0470
X1: Table Speed*X2: Feed Rate*X4: Direction     1          5895.62           18.5735        0.0050
X3: Wheel Grit*X4: Direction                              1          3158.34             9.9500        0.0197
X1: Table Speed*X3: Wheel Grit*X4: Direction    1                2.12             0.0067       0.9376
X2: Feed Rate*X3: Wheel Grit*X4: Direction       1              44.49             0.1401        0.7210
X5: Batch                                                          1         33653.91          106.0229       <.0001
X1: Table Speed*X5: Batch                                1             465.05            1.4651        0.2716
X2: Feed Rate*X5: Batch                                   1             199.15            0.6274        0.4585
X1: Table Speed*X2: Feed Rate*X5: Batch         1             144.71            0.4559        0.5247
X3: Wheel Grit*X5: Batch                                  1                29.36            0.0925       0.7713
X1: Table Speed*X3: Wheel Grit*X5: Batch        1                30.36            0.0957       0.7676
X2: Feed Rate*X3: Wheel Grit*X5: Batch           1                25.58            0.0806       0.7860
X4: Direction *X5: Batch                                    1            1328.83            4.1863       0.0867
X1: Table Speed*X4: Direction *X5: Batch          1              544.58            1.7156       0.2382
X2: Feed Rate*X4: Direction *X5: Batch             1              167.31            0.5271       0.4952
X3: Wheel Grit*X4: Direction *X5: Batch            1                32.46            0.1023       0.7600
 

This fit has a high R squared and adjusted R squared, but the large number of high (>0.10) p-values (in the "Prob>F" column) make it clear that the model has many unnecessary parameters.

Starting with these 26 parameters, we next use the JMP Stepwise Regression option to eliminate unnecessary parameters. By a combination of stepwise regression and the removal of remaining terms with a p-value higher than 0.05, we quickly arrive at a model with a mean term and 12 significant effect terms. 

Output after Fitting 12 Effect Model to Response Data

Response:  Y: Strength
Summary of Fit
RSquare  0.989114
RSquare Adj  0.982239
Root Mean Square Error  14.96346
Mean of Response  546.8959
Observations (or Sum Wgts) 32

Effect Test

Source                                                        DF    Sum of Squares    F Ratio      Prob>F

X1: Table Speed                                               1           894.33              3.9942       0.0602
X2: Feed Rate                                                  1         3497.20            15.6191        0.0009
X1: Table Speed*X2: Feed Rate                        1         4872.57            21.7618        0.0002
X3: Wheel Grit                                                 1       12663.96            56.5595       <.0001
X1: Table Speed*X3: Wheel Grit                       1         1838.76               8.2122       0.0099
X4: Direction                                                   1      315132.65         1407.4390       <.0001
X1: Table Speed*X4: Direction                         1          1637.21               7.3121       .0141
X2: Feed Rate*X4: Direction                            1          1972.71               8.8105       0.0079
X1: Table Speed*X2: Feed Rate*X4:Direction   1          5895.62             26.3309       <.0001
X3: Wheel Grit*X4: Direction                           1          3158.34             14.1057       0.0013
X5: Batch                                                        1       33653.91            150.3044       <.0001
X4: Direction *X5: Batch                                  1         1328.83               5.9348       0.0249
 

Note that we would have arrived at the exact same 12 terms by looking at a normal plot of the full (saturated) model with all 31 effects plotted. This plot is shown below:
 

Most of the effects points cluster close to the center (zero) line and follow the fitted normal model straight line. The effects that appear to be above or below the line are the same effects identified using the stepwise routine, with the exception of X1 (which has to be included in the model since several interaction effects involving X1 appear significant).

At this stage, this model appears to account for most of the variability in the response, achieving an adjusted R squared of 0.982. All the main effects are significant, as are 6 second order interactions and 1 third order interaction. The only interaction that makes little physical sense is the " X4: Direction *X5: Batch" interaction - why would one batch of material react differently when cut in a different direction as compared to an other batch of the same formulation?

However, before accepting any model, residuals need to be examined.

Step 4: Test the model assumptions using residual graphs (adjust and simplify as needed)

First we look at the residuals plotted versus the predicted responses.
 
 


The residuals appear to spread out more with larger values of predicted strength, which should not happen with common variance.

Next we examine the normality of the residuals with a normal quantile plot, a box plot and a histogram.
 


None of these plots appear to show typical normal residuals and 4 out of the 32 data points show up as outliers in the box plot.

Step 4 continued: Transform the data and fit the model again

We next look at whether we can model a transformation of the response variable and obtain better behaving residuals. JMP calculates an optimum Box-Cox transformation by finding the value of lambda that minimizes the model SSE. Note: the Box-Cox transformation used in JMP is different than the transformation used in Dataplot, but roughly equivalent.
 
 

Box-Cox Transformation Graph



The optimum is found at lambda = .2 and a new column Y: Strength X is calculated and added to the JMP data spreadsheet. The properties of this column, showing the transformation equation, are shown below. 

Data Transformation Column Properties


When the 12 effect model is fit to the transformed data, the "X4: Direction *X5: Batch" interaction term is no longer significant. The 11 effect model fit is shown below, with parameter estimates and p-values.

Output after Fitting 11 Effect Model to Transformed Response Data

Response:  Y: Strength X
Summary of Fit
RSquare 0.99041
RSquare Adj 0.985135
Root Mean Square Error 13.81065
Mean of Response 1917.115
Observations (or Sum Wgts) 32

          Effect                                                             Parameter Estimate        p-value
    Intercept                                                                   1917.115                  <.0001 
    X1: Table Speed                                                             5.777                   0.0282
    X2: Feed Rate                                                              11.691                   0.0001 
    X1: Table Speed*X2: Feed Rate                                   -14.467                   <.0001 
    X3: Wheel Grit                                                            -21.649                   <.0001
    X1: Table Speed*X3: Wheel Grit                                     7.339                    0.007
    X4: Direction                                                               -99.272                   <.0001
    X1: Table Speed*X4: Direction                                       -7.188                   0.0080 
    X2: Feed Rate*X4: Direction                                          -9.160                   0.0013 
    X1: Table Speed*X2: Feed Rate*X4:Direction                 15.325                   <.0001 
    X3: Wheel Grit*X4: Direction                                         12.965                   <.0001 
    X5: Batch                                                                    -31.871                   <.0001 
 

This model has a very high R squared and adjusted R squared. The residual plots (shown below) are quite a bit better behaved than before, and pass the Wilks-Shapiro test for normality.
 



 



Step 5. Answer the questions in your experimental objectives

The magnitudes of the model parameters show that "Direction" is by far the most important factor. "Batch" plays the next most critical role, followed by "Wheel Grit". Then, there are several important interactions followed by "Feed Rate". "Table Speed" plays a role in almost every significant interaction term, but is the least important main effect on its own. 

Plots of the main effects and the significant 2-way interactions are shown below.
 
 


 
 


To determine the best setting to use for maximum ceramic strength, JMP has the "Prediction Profile" option shown below.

Y: Strength X
Prediction Profile

The vertical lines indicate the optimal factor settings to maximize the (transformed) strength response. Translating from -1 and +1 back to the actual factor settings we have: Table speed at "1" or .125m/s; Down Feed Rate at "1" or .125 mm; Wheel Grit at "-1" or 140/170 and Direction at "-1" or longitudinal.

Unfortunately, "Batch" is also a very significant factor, with the first batch giving higher strengths than the second. Unless it is possible to learn what worked well with this batch, and how to repeat it, not much can be done about this factor.

Comments

1. One might ask what an analysis of just the 24 factorial with "Direction" kept to -1 (i.e. longitudinal)  would yield. This analysis turns out to have a very simple model; only "Wheel Grit" and "Batch" are significant main effects and no interactions are significant. 

If, on the other hand, we do an analysis of the 24 factorial with "Direction" kept to +1 (i.e. transverse), then we get a 7 parameter model with all the main effects and interactions we saw in the 25 analysis, except, of course, any terms involving "Direction". 

So it appears that the complex model of the full analysis came from the physical properties of a transverse cut, and these complexities are not present for longitudinal cuts. 

2. If we had assumed that three factor and higher interactions were negligible before experimenting, a 2V5-1 half fraction design might have been chosen. In hindsight, we would have gotten valid estimates for all main effects and two factor interactions except for X3 and X5, which would have been aliased with X1*X2*X4 in that half fraction.

3. Finally, we note that many analysts might prefer to adopt a natural logarithm transformation (i.e. use ln Y) as the response, instead of using a Box-Cox transformation with an exponent of .2. The natural logarithm transformation corresponds to an exponent of lamda = 0 in the Box-Cox graph. 

Handbook Home Tools & Aids Search Handbook Previous Page Next Page