Next Page Previous Page Handbook Home Tools & Aids Search Handbook
1. Exploratory Data Analysis
1.1. EDA Introduction

1.1.7.

General Problem Categories

The following table is a convenient way to classify EDA problems.
Problem Classification
UNIVARIATE
Data: A single column of numbers, Y.
Model: y = constant + error
Output: 1) A number (the estimated constant in the model). 2) An estimate of uncertainty for the constant. 3) An estimate of the distribution for the error.
Techniques: 4-Plot
Probability Plot
PPCC Plot
CONTROL
Data: A single column of numbers, Y.
Model: y = constant + error
Output: A yes or no to the question "Is the system out of control?".
Techniques: Control Charts
COMPARATIVE
Data: A single respone variable, k independent variables (Y, X1, X2, ... ,Xk), primary focus is on one (the primary factor) of these independent variables.
Model: y = f(x1, x2, ..., xk) + error
Output: A yes or no to the question "Is the primary factor significant?".
Techniques: Block Plot
Scatter Plot
Box Plot
SCREENING
Data: A single respone variable, k independent variables (Y, X1, X2, ... ,Xk).
Model: y = f(x1, x2, ..., xk) + error
Output: 1) A ranked list (from most important to least important) of factors. 2) Best settings for the factors. 3) A good model/prediction equation relating Y to the factors.
Techniques: Block Plot
Probability Plot
Bihistogram
OPTIMIZATION
Data: A single respone variable, k independent variables (Y, X1, X2, ... ,Xk).
Model: y = f(x1, x2, ..., xk) + error
Output: Best settings for the factor variables.
Techniques: Block Plot
Least Squares Fitting
Contour Plot
REGRESSION
Data: A single respone variable, k independent variables (Y, X1, X2, ... ,Xk). The independent variables can be continuous.
Model: y = f(x1, x2, ..., xk) + error
Output: A good model/prediction equation relating Y to the factors.
Techniques: Least Squares Fitting
Scatter Plot
6-Plot
TIME SERIES
Data: A column of time dependent numbers, Y. In addition, time is an indpendent variable. The time variable can be either explicit or implied. If the data are not equi-spaced, the time variable should be explicitly provided.
Model: yt = f(t) + error
The model can be either a time domain based or frequency domain based.
Output: A good model/prediction equation relating Y to previous values of Y.
Techniques: Autocorrelation Plot
Spectrum
Complex Demodulation Amplitude Plot
Complex Demodulation Phase Plot
ARIMA Models
MULTIVARIATE
Data: k factor variables (X1, X2, ... ,Xk).
Model: Model not explicit.
Output: Identify underlying correlation structure in the data.
Techniques: Star Plot
Profile Plot
Principal Components
Clustering
Discrimination/Classification
Handbook Home Tools & Aids Search Handbook Previous Page Next Page