Next Page Previous Page Handbook Home Tools & Aids Search Handbook
1. Exploratory Data Analysis
1.3. EDA Techniques
1.3.3. Graphical Techniques: Alphabetic
1.3.3.26. Scatter Plot

1.3.3.26.9.

Scatter Plot: Variation of Y Does Depend on X (heteroscedastic)

Scatter plot showing heteroscedastic variability scatter plot showing heteroscedastic variability
Discussion This scatter plot reveals an approximate linear relationship between X and Y, but more importantly, it reveals a statistical condition referred to as heteroscedasticity (that is, different variation). For a heteroscedastic data set, the vertical variation in Y differs depending on the value of X. In this example, small values of X yield small scatter in Y while large values of X result in large scatter in Y.

Heteroscedasticity complicates the analysis somewhat, but its effects can be overcome by:

  1. proper weighting of the data with noiser data being weighted less, or by
  2. performing a Y variable transformation to achieve homoscasticity. The Box-Cox normality plot can help determine a suitable transformation.
Impact of ignoring unequal variability in the data Fortunately, unweighted regression analyses on heteroscedastic data produce estimates of the coeffcients which are unbiased. However, the coefficients will not be as precise as they would be with proper weighting. It is worth noting that weighting is only recommended if the weights are known or if there is suffficient reason for assuming that they are of a certain form; for example, it may be known that a process varies proportionately or inversely with X.
Handbook Home Tools & Aids Search Handbook Previous Page Next Page