Workspace Node: Descriptive Statistics - Results - Normality Tab
In the Descriptive Statistics node dialog box, under the Results heading, select the Normality tab to access the following options. Select the check boxes for the statistics and/or graphs to be produced and placed in the Reporting Documents after running (updating) the project.
- Distribution
- A variety of options are contained in this group box to create statistics, graphs, and summary results to help you explore the normality of the selected variables.
- Frequency tables
- Select this check box to produce a cascade of spreadsheets with the frequency distributions for the selected variables (one spreadsheet per variable). The manner in which the selected variables will be categorized depends on the selections made in the Categorization group box (see below). Note that an extensive selection of categorization methods and frequency table statistics are available via the Frequency Tables node.
- Histograms
- Select this check box to create a cascade of histograms, analogous to the Frequency tables. If Normal expected frequencies is selected, the histograms will also display the normal curve superimposed over the observed frequencies.
- Categorization
- The options selected in the Categorization group box will only affect the frequency tables and histograms produced via the options in the Distribution group box. There are two modes available for categorizing the values of selected variables for the frequency table; a number of additional options, graphs, and statistics are available via the Frequency Tables node.
- Number of intervals
- Select this option button to divide the range of values for the selected variables into approximately the specified number of intervals (entered in the corresponding edit field) for subsequent frequency table spreadsheets or histograms. This option is appropriate when the variables to be tabulated are continuous in nature. The tests of normality (Normal expected frequencies, Kolmogorov-Smirnov & Lilliefors test, and Shapiro-Wilk W test) are only available if this option is selected. Note that the actual number of categories that will be produced may sometimes differ from the number of intervals specified. Statistica produces "neat" intervals; that is, interval boundaries and widths with the last digit being 1, 2, or 5 (e.g., 10.5, 11.0, 11.5, etc.). Such "simple" or "neat" intervals are more easily interpreted than interval boundaries defined by many significant digits (e.g., 10.12423, 10.13533, etc.). Full control over the method of categorization of variables is available via the Frequency Tables node.
- Integer intervals (categories)
- Select this method of categorization if the variables to be tabulated can be interpreted as integer categories, or contain only integer values. If this method is selected, all non-integer values will be ignored when producing Frequency tables or Histograms via the Distribution options; the choice of Categorization in this group box will not affect other computations (e.g., of Detailed descriptive statistics on the Advanced or Quick tabs).
- Normal expected frequencies
- This check box is only available (active) if the Number of intervals option button is selected in the Categorization group box. When the Normal expected frequencies check box is selected, subsequent spreadsheets will contain the expected normal frequencies (cumulative frequencies and relative frequencies) for each category. The Histograms option will display the normal curve superimposed over the observed frequencies. A wide variety of non-normal distributions can be fit to observed data via the Distribution Fitting node; specialized distributions for survival and reliability studies are available via the Survival Analysis node.
- Kolmogorov-Smirnov & Lilliefors test for normality
- This check box is only available (active) if the Number of intervals option button is selected in the Categorization group box. When the Kolmogorov-Smirnov & Lilliefors check box is selected, subsequent frequency spreadsheets will include the results of the Kolmogorov-Smirnov one-sample test of normality. If the D statistic is significant, the hypothesis that the respective distribution is normal should be rejected. Two probability (significance) values will be reported for each Kolmogorov-Smirnov D: The first is based on the probability values as tabulated by Massey (1951); those probability values pertain to cases when the mean and standard deviation of the normal distribution are known a priori and not estimated from the data. However, these parameters are typically computed from the actual data. In this case, the test for normality involves a complex conditional hypothesis ("how likely is it to obtain a D statistic of this magnitude or greater, contingent upon the mean and standard deviation computed from the data"), and the Lilliefors probabilities should be interpreted (Lilliefors, 1967). Note that, in recent years, the Shapiro-Wilk W test (see below) has become the preferred test of normality because of its good power properties as compared to a wide range of alternative tests (see Shapiro, Wilk, & Chen, 1968).
- Shapiro-Wilk W test
- This check box is only available (active) if the Number of intervals option button is set in the Categorization group box. When the Shapiro-Wilk's W test check box is selected, subsequent frequency spreadsheets will include the results of the Shapiro-Wilk W test of normality. If the W statistic is significant, the hypothesis that the respective distribution is normal should be rejected. The Shapiro-Wilk W test is the preferred test of normality because of its good power properties as compared to a wide range of alternative tests (see Shapiro, Wilk, & Chen, 1968). The algorithm implemented in Statistica employs an extension to the test described in Royston (1992), which makes it possible to be applied to samples with up to 5,000 observations (e.g., see 1992); if there are more than 5,000 observations, this test cannot be performed.
- 3D histograms, bivariate distributions
- Select this check box to produce a cascade of 3D histograms for selected pairs of variables, one plot per pair. You will first be prompted to select two lists of variables (from among those originally selected via the
Variables button) via a standard variable selection dialog box. 3D bivariate histograms will be produced for each variable in the first list with each variable in the second list.
Stem and leaf. The stem and leaf plot is an alternative to the Histogram. Like the histogram, the stem and leaf plot (Tukey, 1972) will be produced for the selected variables
- Stem & leaf plot
- Click this button to create a stem and leaf plot. In this plot, each stem represents an interval, just like in a regular histogram. However, unlike in the histogram where we plot a vertical "bar" to indicate the number of cases that fall into the respective interval, here we plot the actual values as leaves of the stem. The cases of the stem and leaf plot are displayed in the following format stem°leaf. Hence, if leaf unit is 1.000000 and stem and leaf value is 7°000038, this means that for the specified variable, there are four 7.0 values, one 7.3 value and one 7.8 value.
- Compressed
- Select this check box to affect the number of intervals that is created for the
stem and leaf plot. When
Compressed is selected, fewer intervals will be displayed on the stem and leaf plot.
Options / C / W. See Common Options.
OK. Click the OK button to accept all the specifications made in the dialog box and to close it. The analysis results will be placed in the Reporting Documents node after running (updating) the project.