Variables
|
Click this button to display a variable selection dialog box, in which you select the grouping (category) and dependent variables for the graph. If you select a category variable and more than one dependent variable for
Regular box plots (see the graph formats below in the
Graph type option description), a sequence of graphs (one for each dependent variable) is produced; if the format is set to
Multiple, box plots for all selected variables are combined in one graph.
Note: if you select only one dependent variable and no category variable, a single box plot representing the distribution of the dependent variable is produced. If you select multiple dependent variables and no category variable, a single graph with box plots for each selected variable is produced. The latter plot is useful for comparing the distribution of several variables.
|
Graph type
|
Select the type of box plot to be plotted. Click a link below for a brief description of that type of graph.
Box Whiskers
High-Low Close
Boxes
Columns
Whiskers
You can also choose between two types of graph formats. Click the links below to learn more about the formats.
Regular
Multiple
|
Grouping intervals
|
Specify a method of categorization for the selected variable(s). Each of the methods is discussed in Method of Categorization.
|
Change Variable
|
Click this button to display a variable selection dialog box in which you can change the selection of the
Dependent variables and/or the
Grouping variables. If you change the variable(s) using this button, display of the variable selection under the
Variables button changes accordingly.
|
Fit type
|
You can choose to fit an equation to the mid-points in the box plots by selecting one of the predefined functions.
Linear
Polynomial
Logarithmic
Exponential
Distance Weighted
Negative Exponential Weighted
Spline
Lowess
Note: The graph icon above the
Middle point group box represents the currently selected
Graph type and the selection of values for the
Middle point,
Box, and
Whiskers. The graph icon previews these three selections and the specific statistics that define the current box plot.
|
|
Middle point
|
The options in this group box control the type of value and appearance of the middle point.
|
Value
|
The middle point can be either the
Mean,
Median,
Mean/Median (uses the Mean as the middle point, plus it has an added marker for the Median), or
Median/Mean (uses the Median as the middle point, plus it has an added marker for the Mean) of the selected variable. The options available for the
Box and
Whisker depend on this selection.
|
Style
|
Specify how the middle point will be represented (by a Line or Point).
|
Pooled variance
|
This check box is available when you select
Mean as the
Middle point Value. The setting of this check box determines how the standard deviations and standard errors (for the means) are computed from grouped data. When the
Pooled variance check box is selected, Statistica computes the pooled within-group (category) variance for all groups (categories), and uses this value as an estimate of s (Sigma) when computing the standard errors for the means (see, for example, Milliken and Johnson, 1984). Specifically, the program computes the pooled within-group (category) variance as:
spooled2 = 1/(n-k) * [s12 *(n1 -1) + ... + sk2 *(nk -1)]
In this equation,
k refers to the k groups in the plot,
s12, refers to the variance in the i'th category or group,
n1 refers to number of valid observations in the i'th category or group, and
n is the overall number of valid observations in the plot.
The standard error of the mean for the i'th group is then computed as:
s.e.(mean) = spooled /square root(ni)
|
|
Multiple box layout
|
When the
Multiple box plot style is selected for the
Graph type format (see above), you can choose to display the boxes, whiskers, columns or box-whiskers in one of two styles:
|
Overlaid
|
Series of box plots are displayed one on top of the other.
|
Shifted
|
Series of box plots are displayed side by side.
|
|
Trim distrib. extremes
|
Specify the percent of cases to be trimmed from the extremes (i.e., tails) of the distribution of cases for the selected variable. For example, if you specify 10%, for a variable with 100 cases, Statistica removes the first 10 lowest value cases and the 10 highest value cases from the distribution, and only plots the 80 middle cases. If you enter a value for Trim distrib. extremes for a mean-based box plot, so-called trimmed means will be plotted.
|
|
Statistics
|
Select statistics to be included as footnotes in the graph.
|
Kruskal-Wallis
|
Select this check box to include the Kruskal-Wallis test statistic as a footnote on the graph.
|
F test and p (ANOVA)
|
Select this check box to include the F and p statistic as a footnote on the graph.
|
|
Box
|
- If
Median is specified as the
Middle point, the range (box) can be represented by
Percentiles or the
Min-Max values of the selected variable, or a specified
Constant value (when you want a fixed size box around the medians).
- If
Mean is specified as the
Middle point, the range (box) can be defined in terms of standard deviations (Std Dev), standard errors (Std Error), Conf. Interval,
Min-Max values of the selected variable, or a specified
Constant value (when you want a fixed size box around the means).
You can also specify a
Coefficient, by which the selected range value is multiplied. Note that, except for unusual applications, the default value of the coefficient should not be changed if the box
Value is
Min-Max. If
Median is specified as the
Middle point, and
Percentiles is specified as the
Value, the
Coefficient entered must be between 0.01 - 50.0. If
Mean is specified as the
Middle point, and Conf. Interval is specified as the
Value, the
Coefficient entered must be between 0.15 - 0.9999.
|
|
Whisker
|
- If
Median is specified as the
Middle point, the range (whiskers) can be represented by
Percentiles or the
Min-Max values of the selected variable, a specified
Constant value, or Non-outlier range (see Outliers and extremes).
- If
Mean is specified as the
Middle point, the range (whiskers) can be defined in terms of standard deviations (Std Dev), standard errors (Std Error), or
Min-Max values of the selected variable, or
Non-outlier range.
If you select
Non-outlier range, Statistica determines which points in the data set are outliers (see Outliers and extremes), and then uses the highest and lowest data points which are closest to the outliers (but are not outliers) as the whiskers in the plot.
You can also specify a
Coefficient by which the selected range value will be multiplied. In most typical applications the coefficient should be set to 1 when the value of the whisker is
Min-Max or
Non-outlier range.
|
|
Outliers
|
You can elect to display none (select
Off), only
Outliers, only
Extreme values, or both
Outliers & Extremes in the box plot. For more details, see Outliers and Extremes.
|
Connect middle points
|
In box plots, you can select the desired mid-point (Mean or
Median or trimmed
Mean or
Median of the selected variable) to be represented by the selected style (point or line; see
Middle point above). Select this check box to connect the middle points of the box plots with a line. If selected, for example, a line plot (or categorized line plot) of means with error bars or a line plot (or categorized line plot) of medians with quantiles and min-max range bars can be produced. Selecting the
Overlaid option button in the
Multiple box layout group box aligns the respective plots from each line. This setting of the
Connect middle points check box can also be used to create line plots of means with error bars, or line plots of medians with range bars. Also, this format is used in some statistical procedures to create predefined graphical output.
|
Display raw data
|
Select this check box to display the raw data points.
|
|
Jitter
|
Use the options in this group box to jitter the data points, i.e., modify the original position of the data point from the center of the graph in order to more easily identify/brush overlapping points.
|
Off
|
No jitter is applied to the raw data points, outliers, and extremes.
|
Sequential
|
The jitter is applied sequentially to the raw data points, outliers, and extremes. The jitter is applied such that the first case in the data set is maximally shifted to the left and the last case is shifted maximally to the right.
|
Random
|
The data point is randomly shifted within the available range.
|
|
Width.
|
Specify the maximum jitter width defined as percentage of box width. Possible percentages range from 0 to 250.
Options / C / W. See Common Options.
|
OK
|
Click this button to accept all the specifications made in the dialog box and to close it. The results are placed in the
Reporting Documents workspace node after running (updating) the project.
|