|
1.
Exploratory Data Analysis
1.2. EDA Assumptions 1.2.5. Consequences
|
|||
| Distributional Analysis |
Scientists and engineers routinely use the mean (average)
to estimate the "middle" of a distribution. It is not so well known
that the the variability and the the noisiness of the mean as
a location estimator is intrinsically tied in with the underlying
distribution of the data. For certain distributions, the mean
is a poor choice. For any given distribution, there exists an
optimal choice-- that is, the estimator with minimum
variability/noisiness. This optimal choice may be, for example, the
median, the midrange, the midmean, the
mean, or something else. The implication of this is to
"estimate" the distribution
first, and then--based on the
distribution--choose the
optimal estimator. The resulting engineering parameters will be
more accurate and less uncertain.
The airplane glass failure case study gives an example of determining an appropriate distribution and estimating the parameters of that distribution. The uniform random numbers case study gives an example of determing a more appropriate centrality parameter for a non-normal distribution. |
||
| Other consequences which flow from problems with distributional assumptions are: | |||
| Distribution |
|
||
| Model |
| ||
| Process |
|
||