Next Page Previous Page Handbook Home Tools & Aids Search Handbook
3. Production Process Characterization
3.2. Assumptions / Prerequisites

3.2.4.

Discrete Models

Description There are many instances when we are faced with the analysis of data where the responses are discrete rather than continuous. Examples of this are yield (good/bad), speed bins (slow/fast/faster/fastest), survey results (favor/oppose), etc. We then try to explain the discrete outcomes with some combination of discrete or explanatory variables. In this situation, the modeling techniques we have learned so far (CLM and ANOVA) are no longer appropriate. 
There are two primary methods at our disposal for the analysis of discrete response data. The first one applies to case where we have discrete explanatory variables and discrete responses and is known as Contingency Table Analysis. This model is covered in detail in this section. The second model applies to the case where we have both discrete and continuous explanatory variables and is referred to as a Log-Linear Model. This model is beyond the scope of this book but interested readers should refer to the reference section of this chapter for a list useful books on the topic.
Model Suppose we have n individuals that we classify according to two criteria A and B. Suppose there are r levels of criteria A and s levels of criteria B. These responses can then be displayed in an r x s table. For example, suppose we have a box of manufactured parts that we classify as to whether they are good or bad and whether they came from supplier 1, 2 or 3.
Now, each cell of this table will have a count of the individuals that fall into its particular combination of classification levels. Let's call this count Nij . The sum of all  of these counts will be equal to the total number of individuals N . Also each row of the table will sum to Ni. and each column will sum to N.j .
Under the assumption that there is no interaction between the two classifying variables (like the number of good or bad parts does not depend on which supplier they came from), we can calculate the counts we would expect to see in each cell. Let's call the expected count for any cell Eij . Then the expected value for a cell is calculated according to Eij = Ni. * N.j /N . All we need to do then is to compare the expected counts to the observed counts. If we didn't observe what we expected, then the two variables interact in some way.
Estimation The estimation is very simple. All we do is make a table of the observed counts and then calculate the expected counts as described above.
Testing The test is done using a Chi-Square goodness-of-fit test according to the following formula:

where the summation is across all of the cells in the table. 

Given the assumptions stated below, this statistic has a chi-square distribution and is compared against a chi-square table with 1 degree of freedom. If the value of the test statistic is less than the chi-square value for a given level of confidence, then the classifying variables are independent, otherwise there is an interaction between them.
Assumptions The estimation and testing results above hold regardless of whether the sample model is poisson, multinomial, or product-multinomial. The chi-square results start to break down if the counts in any cell are small, say < 5 .
Uses The contingency table method is really just a test of interaction between discrete explanatory variables for discrete responses. The example given below is for two factors. The methods are equally applicable to more factors but as with any interaction, as you add more factors, the interpretation of the results becomes more difficult.
Example Suppose we are comparing the yield from two manufacturing processes.  Want want to know if one process yields better than the other one. 
Make table of counts.
Good Bad Totals
Process A 86 14 100
Process B 80 20 100
Totals 166 34 200
Table 1. Yields for two production processes
We obtain the expected values by the formula given above.  This gives the table below.
Calculate expected counts.
Good Bad Totals
Process A 83 17 100
Process B 83 17 100
Totals 166 34 200
Table 2. Expected values for two production processes
Calculate chi-square statistic and compare to table value. The chi-square statistic works out to be equal to about 1.3. This is below the chi-square value for 1 degree of freedom and 90% confidence of 2.71 . Therefore we conclude that the two different process did not have a significant impact on yield.
Conclusion Therefore we conclude that there is no statistically significant difference between the two process.
Handbook Home Tools & Aids Search Handbook Previous Page Next Page