|
1.
Exploratory Data Analysis
1.4. EDA Case Studies
|
|||
| Purpose |
The purpose of the first eight case studies is to show how
characteristics of EDA graphics and quantitative measures and tests
as they are applied to data from scientific processes and to
critique those data with regard to the following assumptions that
typically underlie a measurement process; namely, that the data behave
like:
|
||
| Yi = C + Ei |
If the above assumptions are satisfied, the process is said to be
statistically "in control" with the core characteristic of having
"predictability", that is being able to make probability
statements about the process, not only in the past, but also in the
future.
An appropriate model for an "in control" process is
The constant C is the "typical value" of the process--it is the primary summary number which shows up on any report. Although C is (assumed) fixed, it is unknown, and so a primary analysis objective of the engineer is to arrive at an estimate of C. This goal partitions into 4 sub-goals:
|
||
| Assumptions not satisfied |
If one or more of the above assumptions is not satisfied, then
we use EDA techniques, or some mix of EDA and classical techniques,
to find a more appropriate model for the data. That is,
If the data are not random, then we may investigate fitting some simple time series models to the data. If the constant location and scale assumptions are violated, we may need to investigate the measurement process to see if there is an explanation. The assumptions above are still quite relevant in the sense that for an approriate model the error component should follow the assumptions. The criterion for validating the model, or comparing competing models, is framed in terms of these assumptions. |
||
| Non-univariate data |
Although the case studies in this chapter concentrate on
univariate data, the assumptions above are relevant for
non-univariate data as well.
If the data is not univariate, then we are trying to find a model
The load cell calibration case study in the process modeling chapter shows an example of this in the regression context. |
||
| First three case studies operate on data with known characteristics |
The first three case studies operate on data which are randomly
generated from the following disributions:
|
||
| Graphical methods that are applied to the data |
To test the underlying assumptions, each data set is analyzed using
four graphical methods which are particularly suited to this purpose:
Additional graphical techniques are used in certain case studies to develop models that do have error components that satisfy the underlying assumptions. |
||
| Quantitative methods that are applied to the data |
The normal and uniform random number data sets are also analyzed with
the following quantitative techniques which are explained in more
detail in an earlier section:
Although the graphical methods applied to the normal and uniform random numbers are sufficient to assess the validity of the underlying assumptions, the quantitative techniques are used to show the differing flavor of the graphical and quantitative approaches. The remaining case studies intermix one or more of these quantitative technques into the analysis where appropriate. |
||