1.
Exploratory Data Analysis
1.2.
EDA Assumptions
1.2.1.
|
Underlying Assumptions?
|
|
|
Assumptions underlying a measurement process
|
There are four assumptions which typically underlie all measurement
processes, namely, that the data from the process at hand "behave
like":
- random drawings;
- from a fixed distribution;
- with that distribution having a fixed location; and
- with that distribution having a fixed variation.
|
|
Univariate or single response variable
|
The "fixed location" referred to in item 3 above differs for different
problem types. The simplest problem type is univariate, that is
a single variable. For the univariate problem, the general model
response = deterministic component + random component
becomes
response = constant + error
|
|
Assumptions for univariate model
|
For this case, the "fixed location" is simply the unknown constant.
We can thus imagine the process at hand to be
operating under constant conditions that produces a single column
of data with the properties that
- the data are uncorrelated with one another;
- the random component has a fixed distributuion;
- the deterministic component consists only of a constant; and
- the random component has fixed variation.
|
|
Extrapolation to a function of many variables
|
The universal power and importance of the univariate model is that
it easily extrapolates to the more general case where the
deterministic component is not just a constant but is in fact a
function of many variables and the engineering objective is to
characterize and model the function.
|
|
Residuals will behave according to univariate assumptions
|
The key point is that regardless of how many factors, and regardless
of how complicated the function, if the engineer succeeds in choosing
a good model, then the differences (residuals) between the raw
response data and the predicted values from the fitted model should
themselves behave like a univariate process. Further, this univariate
process fit will behave like:
- random drawings;
- from a fixed distribution;
- with fixed location (namely, 0 in this case); and
- with fixed variation.
|
|
Validation of model
|
Thus if the residuals from
the fitted model do in fact behave
like the ideal, then testing of underlying assumptions becomes a
tool for the validation and quality of fit of the chosen model.
On the other hand, if the resiudals from the chosen fitted model
violate one or more of the above univariate assumptions, then
the chosen fitted model is inadequate and an opportunity exists for
arriving at an improved model.
|