|
1.
Exploratory Data Analysis
1.3. EDA Techniques
|
|||
| Confirmatory Statistics |
The techniques discussed in this section are classical statistical
methods as opposed to EDA techniques. EDA and classical techniques
are not mutually exclusive and can be used in a complimentary
fashion. For example, the analysis can start with some simple
graphical techniques such as the 4-plot followed by the classical
confirmatory methods discussed here to provide more rigorous statments
about the conclusions. If the classical methods yield different
conclusions than the graphical analysis, then some effort should be
invested to explain why. Often this is an indication that some of the
assumptions of the classical techniques are violated.
Many of the quantitative techniques fall into two broad categories:
|
||
| Interval Estimates |
It is common in statistics to estimate a parameter
from a sample of data. The value of the parameter using all the
data, not just the sampled points, is called the population
parameter or true value of the parameter. An estimate of the
true parameter value is made using the sample data. This
is called a point estimate or a sample estimate.
For example, the most commonly used measure of location is the mean. The population, or true, mean is the sum of all the members of the given population divided by the number of members in the population. As it typically impractical to measure every member of the population, a random sample is drawn from the entire population. The sample mean is calculated by summing the values in the sample and dividing by the number of values in the sample. This sample mean is then used the point estimate of the population mean. Interval estimates expand on point estimates by providing an indication of the uncertainty of the point estimate. In the example for the mean above, different samples from the same population will generate different values for the sample mean. An interval estimate quantifies this uncertainty in the sample estimate by specifying lower and upper values of an interval which can be said, with a given level of confidence, to contain the population parameter. |
||
| Hypothesis Tests |
Hypothesis tests also address the uncertainty of the sample
estimate. However, instead of providing an interval, an
hypothesis test attempts to refute a specific claim about a
population parameter based on the sample data. For example,
the hypothesis might be one of the following:
To reject a hypothesis is to conclude that it is false. However, to accept a hypothesis does not mean that that it is true, only that we do not have evidence to believe otherwise. Thus hypothesis tests are stated in terms of both an acceptable outcome (null) and an unacceptable outcome (alternative). A common format for a hypothesis test is: |
||
| Practical Versus Statistical Significance | It is important to distinguish between statistical significance and practical significance. Statistical significance simply means that we reject the null hypothesis. The ability of the test to detect differences that lead to rejection of the null hypothesis depends on the sample size. For example, for a particularly large sample, the test may reject the null hypothesis that two processes are equivalent. However, in practice the difference between the two processes may be relatively small to the point of having no real engineering significance. Similarly, if the sample size is small, a difference which is large in engineering terms, may not lead to rejection of the null hypothesis. The analyst should not simply blindly apply the tests, but should combine engineering judgement with statistical analysis. | ||
| Bootstrap Uncertainty Estimates | In some cases, it is possible to mathematically derive appropriate uncertainty intervals. This is particularly true for intervals based on the assumption of normal distributions for the data. However, there are many cases in which it is not possible to mathematically derive the uncertainty. In these cases, the bootstrap provides a method for empirically determining an appropriate interval. | ||
| Table of Contents |
Some of the more common classical quantitative techniques are listed
below. This list of quantitative techniques is by no means meant
to be exhaustive. Additional discussions of classical statistical
techniques are contained in the
product comparisons chapter.
|
||