Next Page Previous Page Handbook Home Tools & Aids Search Handbook



1. Exploratory Data Analysis
1.3. EDA Techniques
1.3.3. Graphical Techniques: Alphabetic

1.3.3.24.

Quantile-Quantile Plot

Purpose:
Check if two data sets come from a common distribution
The quantile-quantile (q-q) plot is a graphical technique for determining if two data sets come from a common distribution.

A q-q plot is a plot of the quantiles of the first data set against the quantiles of the second distribution. By a quantile, we mean the fraction (or percent) of points below the given quantile. That is, the 0.3 (or 30%) quantile is the point at which 30% percent of the data fall below and 70% fall above that value.

A 45 degree reference line is also plotted. If the two sets come from the same distribution, the points should fall along this reference line. The greater the departures from this reference line, the greater the evidence for the conclusion that the two data sets come from different distributions.

The advantages of the q-q plot are:

  1. The sample sizes do not need to be equal.
  2. Many distributional aspects can be simultaneously tested. For example, shifts in location, shifts in scale, changes in symmetry, and the prescence of outliers can all be detected from this plot. For example, if the two data sets come from distributions that differ only by a shift in location, the points should lie along a straight line that is displaced either up or down from the 45 degree reference line.

The q-q plot is similar to a probability plot. For a probability plot, the quantiles for one of the data samples are replaced with the quantiles of a theoretical distribution.

Sample Plot sample quantile-quantile plot

This q-q plot shows that

  1. These 2 batches do not appear to come from a common distribution.
  2. The batch 1 values are significantly higher than the corresponding batch 2 values.
  3. The differences are increasing from values 525 to 625. Then the values for the 2 batches get closer again.
Definition:
Quantiles for data set 1 versus quantiles of data set 2
The q-q plot is formed by:
  • Vertical axis: Estimated quantiles from data set 1
  • Horizontal axis: Estimated quantiles from data set 2

Both axes are in units of their respective data sets. That is, the actual quantile level is not plotted. For a given point on the q-q plot, we know that the quantile level is the same for both points, but not what that quantile level actually is.

If the data sets have the same size, the q-q plot is essentially a plot of the sorted data set 1 against the sorted data set 2. If the data sets are not of equal size, the quantiles are usually picked to correspond to the sorted values from the smaller data set and then the quantiles for the larger data set are interpolated.

Questions The q-q plot is used to answer the following questions:
  • Do two data sets come from a common distribution?
  • Do two data sets have common location and scale?
  • Do two data sets have similar distributional shapes?
  • Do two data sets have similar tail behaviour?
Importance: Check for common distribution When there are two data samples, it is often desirable to know if the assumption of a common distribution is justified. If so, then location and scale estimators can pool both data sets to obtain single common estimates. If two samples do differ, it is also useful to gain some understanding of the differences. The q-q plot can provide more insight into the nature of the difference better than analytical methods such as the chi-square or Kolmogorov 2-sample tests.
Related Techniques Bi-Histogram
T Test
F Test
2-Sample Chi-Square Test
2-Sample Kolmogorov-Smirnov Test
Case Study The quantile-quantile plot is demonstrated in the ceramic strength data case study.
Software Q-Q plots are available is some general purpose statistical software programs, including Dataplot. If the number of data points in the two samples are equal, it should be relatively easy to write a macro in statistical programs that do not support the q-q plot. If the number of points are not equal, writing a macro for a q-q plot may be difficult.
Handbook Home Tools & Aids Search Handbook Previous Page Next Page