Non-Normal Distributions - Assessing the Fit: Quantile and Probability Plots

For each distribution, Statistica computes the table of expected and observed frequencies and the respective Chi-square goodness-of-fit test, as well as the Kolmogorov-Smirnov d test.

However, the best way to assess the quality of the fit of a theoretical distribution to an observed distribution is to review the plot of the observed distribution against the theoretical fitted distribution. There are two standard types of plots used for this purpose: Quantile-quantile (Q-Q) plots and probability-probability (P-P) plots.

Quantile-quantile (Q-Q) plots

In quantile-quantile plots (Q-Q plots), the observed values of a variable are plotted against the theoretical quantiles.

To produce a Q-Q plot, STATISTICA first sorts the n observed data points into ascending order, so that:

x1 <= x2 <= ... <= xn

These observed values are plotted against one axis of the graph; on the other axis the plot will show:

F-1 ((i-radj)/(n+nadj))

where i is the rank of the respective observation, radj and nadj are adjustment factors (<= 0.5) and F -1 denotes the inverse of the probability integral for the respective standardized distribution. The resulting plot is a scatterplot of the observed values against the (standardized) expected values, given the respective distribution. Note that, in addition to the inverse probability integral value, STATISTICA will also show the respective cumulative probability values on the opposite axis, that is, the plot will show not only the standardized values for the theoretical distribution, but also the respective p-values.

A good fit of the theoretical distribution to the observed values would be indicated by this plot if the plotted values fall onto a straight line. Note that the adjustment factors radj and nadj ensure that the p-value for the inverse probability integral will fall between 0 and 1, but not including 0 and 1 (see Chambers, Cleveland, Kleiner, and Tukey, 1983; in Statistica , the default value for both adjustment factors is 1/3=.333).

Probability-probability plots

In probability-probability plots (P-P plots) the observed cumulative distribution function is plotted against the theoretical cumulative distribution function.

As in the Q-Q plot, the values of the respective variable are first sorted into ascending order. The i'th observation is plotted against one axis as i/n (i.e., the observed cumulative distribution function), and against the other axis as F(x(i)), where F(x(i)) stands for the value of the theoretical cumulative distribution function for the respective observation x(i). If the theoretical cumulative distribution approximates the observed distribution well, then all points in this plot should fall onto the diagonal line.