Next Page Previous Page Handbook Home Tools & Aids Search Handbook



1. Exploratory Data Analysis
1.3. EDA Techniques
1.3.3. Graphical Techniques: Alphabetic

1.3.3.22.

Probability Plot

Purpose:
Check if data follow a given distribution
The probability plot (Chambers 1983) is a graphical technique for assessing whether or not a data set follows a given distribution such as the normal or Weibull.

The data are plotted against a theoretical distribution in such a way that the points should form a straight line. Departures from this straight line indicate departures from the specified distribution.

The correlation coefficient associated with the linear fit to the data in the probability plot is a measure of the goodness of the fit. Estimates of the location and scale parameters of the distribution are given by the intercept and slope. Probability plots can be generated for several competing distributions to see which provides the best fit, and the probability plot generating the highest correlation coefficient is the best choice since it generates the straightest probability plot.

For distributions with shape parameters (not counting location and scale parameters), the shape parameters must be known in order to generate the probability plot. For distributions with a single shape parameter, the probability plot correlation coefficient (PPCC) plot provides an excellent method for estimating the shape parameter.

Sample Plot sample Weibull Probability Plot

This data is a set of 500 Weibull random numbers with a shape parameter = 2, location parameter = 0, and scale parameter = 1. The Weibull probability plot indicates that the Weibull distribution does in fact fit these data well.

Definition:
Ordered response values versus order statistic medians for the given distribution
The probability plot is formed by:
  • Vertical axis: Ordered response values
  • Horizontal axis: Order statistic medians for the given distribution

The order statistic medians are defined as:

    N(i) = G(U(i))
where U(i) are the uniform order statistic medians (defined below) and G is the percent point function for the desired distribution. The percent point function is the inverse of the cumulative distribution function (probability that x is less than or equal to some value). That is, given a probability, we want the corresponding x of the cumulative distribution function.

The uniform order statistic medians are defined as:

In addition, a straight line can be fit to the points and added as a reference line. The further the points vary from this line, the greater the indication of departures from the specified distribution.

This definition implies that a probability plot can be easily generated for any distribution for which the percent point function can be computed.

One advantage of this method of computing proability plots is that the intercept and slope estimates of the fitted line are in fact estimates for the location and scale parameters of the distribution. Although this is not too important for the normal distribution (the location and scale are estimated by the mean and standard deviation respectively), it can be useful for many other distributions.

Questions The probability plot is used to answer the following question:
  • Does a given distribution, such as the Weibull, provide a good fit for my data?
  • What distribution best fits my data?
  • What are good esitimates for the location and scale parameters of the chosen distribution?
Importance:
Check distributional assumption
The underlying assumptions for a measurement process are that the data should behave like:
  1. random drawings;
  2. from a fixed distribution;
  3. with fixed location;
  4. with fixed scale.
Probability plots are used to assess the assumption of fixed distribution. In particular, most statistical models are of the form:
    response = deterministic + random
where the deterministic part is the fit and the random part is error. This error component in most common statistical models is specifically assumed to be normally distributed with fixed location and scale. This is the most frequent application of normal probability plots. That is, a model is fit and a normal probability plot is generated for the residuals from the fitted model. If the residuals from the fitted model are not normally distributed, then one of the major assumptions of the model has been violated.

Some statistical models assume data come from a specific type of distribution. For example, in reliability applications, the Weibull, lognormal, and exponential are commonly used distributional models.

Related Techniques
Case Study The probability plot is demonstrated in the airplane glass failure time data case study.
Software Most general purpose statistical software programs support probability plots at least for a few common distributions. Dataplot supports probability plots for a large number of distributions.
Handbook Home Tools & Aids Search Handbook Previous Page Next Page