Next Page Previous Page Handbook Home Tools & Aids Search Handbook
1. Exploratory Data Analysis
1.2. EDA Assumptions
1.2.5. Consequences

1.2.5.4.

Consequences Related to Distributional Assumptions

Distributional Analysis Scientists and engineers routinely use the mean (average) to estimate the "middle" of a distribution. It is not so well known that the the variability and the the noisiness of the mean as a location estimator is intrinsically tied in with the underlying distribution of the data. For certain distributions, the mean is a poor choice. For any given distribution, there exists an optimal choice-- that is, the estimator with minimum variability/noisiness. This optimal choice may be, for example, the median, the midrange, the midmean, the mean, or something else. The implication of this is to "estimate" the distribution first, and then--based on the distribution--choose the optimal estimator. The resulting engineering parameters will be more accurate and less uncertain.

The airplane glass failure case study gives an example of determining an appropriate distribution and estimating the parameters of that distribution. The uniform random numbers case study gives an example of determing a more appropriate centrality parameter for a non-normal distribution.

Other consequences which flow from problems with distributional assumptions are:
Distribution
  1. The distribution may be changing.
  2. The single distribution estimate may be meaningless (if the process distribution is changing).
  3. The distribution may be non-normal.
  4. The distribution may be unknown.
  5. The true probability distribution for the error may remain unknown.
Model
  1. The model may be changing.
  2. The single model estimate may be meaningless.
  3. The default model
      Y = constant + error
    may be invalid.
  4. If the default model is insufficient, information about a better model may remain undetected.
  5. The wrong deterministic model may get fit.
  6. Information about improved model may go undetected.
Process
  1. The process may be out-of-control.
  2. The process may be unpredictable.
  3. The process may be un-modelable.
Handbook Home Tools & Aids Search Handbook Previous Page Next Page