3.
Production
Process Characterization
3.2.
Assumptions / Prerequisites
|
| Description |
There are many instances when we are faced with the analysis
of data where the responses are discrete rather than continuous. Examples
of this are yield (good/bad), speed bins (slow/fast/faster/fastest), survey
results (favor/oppose), etc. We then try to explain the discrete outcomes
with some combination of discrete or explanatory variables. In this situation,
the modeling techniques we have learned so far (CLM
and ANOVA) are no longer appropriate. |
|
There are two primary methods at our disposal for
the analysis of discrete response data. The first one applies to case where
we have discrete explanatory variables and discrete responses and is known
as Contingency Table Analysis. This model is covered in detail in this
section. The second model applies to the case where we have both discrete
and continuous explanatory variables and is referred to as a Log-Linear
Model. This model is beyond the scope of this book but interested readers
should refer to the reference section of this
chapter for a list useful books on the topic. |
| Model |
Suppose we have n individuals that we classify according
to two criteria A and B. Suppose there are r levels of criteria
A and s levels of criteria B. These responses can then be displayed
in an r x s table. For example, suppose we have a box of manufactured
parts that we classify as to whether they are good or bad and whether they
came from supplier 1, 2 or 3. |
|
Now, each cell of this table will have a count of the individuals that
fall into its particular combination of classification levels. Let's call
this count Nij . The sum of all of
these counts will be equal to the total number of individuals N . Also
each row of the table will sum to Ni. and
each column will sum to N.j . |
|
|
Under the assumption that there is no interaction between the two classifying
variables (like the number of good or bad parts does not depend on which
supplier they came from), we can calculate the counts we would expect to
see in each cell. Let's call the expected count for any cell Eij
. Then the expected value for a cell is calculated according to Eij
= Ni. * N.j
/N . All we need to do then is to compare the expected counts to the observed
counts. If we didn't observe what we expected, then the two variables interact
in some way. |
|
| Estimation |
The estimation is very simple. All we
do is make a table of the observed counts and then calculate the expected
counts as described above. |
| Testing |
The test is done using a Chi-Square goodness-of-fit
test according to the following formula:

where the summation is across all of the cells in the table. |
|
Given the assumptions stated below, this statistic has a chi-square
distribution and is compared against a chi-square table with 1 degree of
freedom. If the value of the test statistic is less than the chi-square
value for a given level of confidence, then the classifying variables are
independent, otherwise there is an interaction between them. |
|
| Assumptions |
The estimation and testing results above
hold regardless of whether the sample model is poisson, multinomial, or
product-multinomial. The chi-square results start to break down if the
counts in any cell are small, say < 5 . |
| Uses |
The contingency table method is really
just a test of interaction between discrete explanatory variables for discrete
responses. The example given below is for two factors. The methods are
equally applicable to more factors but as with any interaction, as you
add more factors, the interpretation of the results becomes more difficult. |
| Example |
Suppose we are comparing the yield from two manufacturing
processes. Want want to know if one process yields better than the
other one. |
| Make table of counts. |
|
Good |
Bad |
Totals |
| Process A |
86 |
14 |
100 |
| Process B |
80 |
20 |
100 |
| Totals |
166 |
34 |
200 |
Table 1. Yields for two production processes
|
|
|
We obtain the expected values by the formula given above. This
gives the table below. |
| Calculate expected counts. |
|
Good |
Bad |
Totals |
| Process A |
83 |
17 |
100 |
| Process B |
83 |
17 |
100 |
| Totals |
166 |
34 |
200 |
Table 2. Expected values for two production processes
|
|
| Calculate chi-square statistic and compare to
table value. |
The chi-square statistic works out to be equal to about 1.3. This is
below the chi-square value for 1 degree of freedom and 90% confidence of
2.71 . Therefore we conclude that the two different process did not have
a significant impact on yield. |
| Conclusion |
Therefore we conclude that there is no statistically significant difference
between the two process. |