1.
Exploratory Data Analysis
1.1.
EDA Introduction
1.1.7.
|
General Problem Categories
|
|
|
|
The following table is a convenient way to classify EDA
problems.
|
|
Problem Classification
|
UNIVARIATE
| Data: |
A single column of numbers, Y. |
| Model: |
y = constant + error |
| Output: |
1) A number (the estimated constant in the model).
2) An estimate of uncertainty for the constant.
3) An estimate of the distribution for the error.
|
| Techniques: |
4-Plot
Probability Plot
PPCC Plot
|
|
CONTROL
| Data: |
A single column of numbers, Y. |
| Model: |
y = constant + error |
| Output: |
A yes or no to the question "Is the
system out of control?". |
| Techniques: |
Control Charts
|
|
|
|
|
COMPARATIVE
| Data: |
A single respone variable, k independent variables
(Y, X1, X2, ... ,Xk),
primary focus is on one (the primary factor)
of these independent variables.
|
| Model: |
y = f(x1, x2, ..., xk) + error |
| Output: |
A yes or no to the question "Is the primary factor
significant?".
|
| Techniques: |
Block Plot
Scatter Plot
Box Plot
|
|
SCREENING
| Data: |
A single respone variable, k independent variables
(Y, X1, X2, ... ,Xk).
|
| Model: |
y = f(x1, x2, ..., xk) + error |
| Output: |
1) A ranked list (from most important to least
important) of factors. 2) Best settings for the
factors. 3) A good model/prediction equation
relating Y to the factors.
|
| Techniques: |
Block Plot
Probability Plot
Bihistogram
|
|
|
|
|
OPTIMIZATION
| Data: |
A single respone variable, k independent variables
(Y, X1, X2, ... ,Xk).
|
| Model: |
y = f(x1, x2, ..., xk) + error |
| Output: |
Best settings for the factor variables.
|
| Techniques: |
Block Plot
Least Squares Fitting
Contour Plot
|
|
REGRESSION
| Data: |
A single respone variable, k independent variables
(Y, X1, X2, ... ,Xk).
The independent variables can be continuous.
|
| Model: |
y = f(x1, x2, ..., xk) + error |
| Output: |
A good model/prediction equation relating Y to the
factors.
|
| Techniques: |
Least Squares Fitting
Scatter Plot
6-Plot
|
|
|
|
|
|
TIME SERIES
|
MULTIVARIATE
| Data: |
k factor variables
(X1, X2, ... ,Xk).
|
| Model: |
Model not explicit. |
| Output: |
Identify underlying correlation structure in the
data.
|
| Techniques: |
Star Plot
Profile Plot
Principal Components
Clustering
Discrimination/Classification
|
|
|