|
7. Product and Process Comparisons 7.2. Comparisons based on data from one process 7.2.1. Do the observations come from a particular distribution? |
|
| Data are often assumed to come from a particular distribution. | Goodness-of-fit tests indicate whether or not it is reasonable to assume that a random sample comes from a specific distribution. Statistical techniques often rely on observations being of a specific form (e.g., normal, lognormal, Poisson, etc.). Standard control charts for instance, often assume that the data (or at least the plotted sample means) come from an approximately normal distribution. In reliability applications, accurate lifetime modeling generally requires specifying the correct distributional model. There may be historical or theoretical reasons to assume a sample comes from a particular population, as well. Past data may have consistently fit a known distribution, for example, or theory may predict that the underlying population should be of a specific form. |
| Hypothesis Test model for Goodness-of-fit |
Goodness-of-fit tests are a form of hypothesis testing where the
null and alternative hypotheses are
H0: Sample data come from the stated
distribution.
|
| Parameters may be assumed or estimated from the data |
One needs to consider whether a simple or composite hypothesis is being tested. For a simple hypothesis, values of the distribution's parameters are specified prior
to drawing the sample. For a composite hypothesis, one or more of the
parameters is unknown. Often, these parameters are estimated using the sample
observations.
A simple hypothesis would be:
H0: Data are from a normal distribution, A composite hypotheses would be: H0: Data are from a normal distribution, unknown Composite hypotheses are more common because they allow us to decide whether a sample comes from any distribution of a specific type. In this situation, the form of the distribution is of interest, regardless of the values the parameters. Unfortunately, composite hypotheses are more difficult to work with because the critical values are often hard to compute. |
| Problems with censored data | A second issue that affects a test is whether the data are censored. When data are censored, sample values are in some way restricted. Censoring occurs if the range of potential values are limited such that values from one or both tails of the distribution are unavailable (e.g., right and/or left censoring - where high and/or low values are missing). Censoring frequently occurs in reliability testing, when either the testing time or the number of failures to be observed is fixed in advance. A thorough treatment of goodness-of-fit testing under censoring is beyond the scope of this document. See D'Agostino & Stephens (1986) for more details. |
| Three types of tests will be covered | Three goodness-of-fit tests are examined in
detail:
A more extensive treatment of goodness of fit techniques is presented in D'Agostino & Stephens (1986). Along with the tests mentioned above, other general and specific tests are examined, including tests based on regression and graphical techniques. |