|
1.
Exploratory Data Analysis
1.1. EDA Introduction
|
|||
| Primary and Secondary Goals |
The primary goal of EDA is to maximize the analyst's insight into
a data set, into the underlying structure of a data set, and
a secondary output of EDA is all of the specific items that an
analyst would want to extract from a data set such as:
|
||
| Insight into the Data |
Insight implies detecting and uncovering underlying structure in
the data. Such underlying structure may not be encapsulated
in the list of items above; such items serve as the specific targets
of an analysis, but the real insight and "feel" for a data set
comes as the analyst judiciously probes and explores the various
subtleties of the data. The "feel" for the data comes almost
exclusively from the application of various graphical
techniques--the collection of which serves as
the window into the essence of the data. Graphics are
irreplaceable--there are no quantitative analogues that will give the
same insight as well-chosen graphics.
To get a "feel" for the data, it is not enough for the analyst to know what is in the data--the analyst must know what is not in the data, and the only way to do that is to draw on our own human pattern-recognition and comparative abilities in the context of a series of judicious graphical techniques applied to the data. |
||