Categorized Means with Error Plots - Advanced Tab

Graphical Analytic Techniques

The Advanced tab of the Categorized Means with Error Plots Startup Panel contains various additional (to the Categorized Means with Error Plots - Quick tab) options for the Means with Error Plots. Use the options on this tab to specify the variables and select the type of graph you want to create. More options are available for computing the graph as well as for its display. Some of the options on this tab are used to add additional components, such as the fit of a predefined function, the outliers and extremes, and certain test statistics.

Graph type

Select the type of Mean with Error Plot to be plotted from the Graph type group box. Click the desired plot link below to obtain a brief description of that type of graph.

Whiskers	Columns
High-Low Close

Layout. Select the type of layout for the graph(s).

Separate

Select this option button to produce a Separate plot layout (where each subset of cases is displayed in a separate graph) for the categorized plots.

Overlaid

Select this option button to produce an Overlaid plot layout (where all subsets are overlaid in one graph and identified by patterns and colors) for the categorized plots.

Variables

Click the Variables button to display a standard variable selection dialog box in which you can select the Dependent variable, the Grouping variable, and the X- and (optional) Y-Category variables for creating the graph. If more than one dependent variable is selected, a sequence of graphs (one for each dependent variable) will be produced using the same set of grouping variables. The selections made will be displayed below the Variables button.

The dependent variable values will be used in calculating the respective statistics that define the components of the graph (e.g., means, medians, standard deviations, etc.), while the grouping variable will be used to categorize the data, using the method of categorization as selected via the options in the Grouping intervals group. Note that the selected grouping variables do not have to be categorical variables (e.g., contain codes); you can use one of the methods of categorization to categorize continuous variables. The selection of grouping variables is not necessary if the categories are defined via the Multiple subsets method in the X-Categories, Y-Categories, and Intervals group boxes.

X-Categories / Y-Categories. Categorization is used in two classes of graphs in Statistica: categorized graphs (e.g., Categorized Scatterplots) and graphs that include grouping or categorized variables (e.g., 2D Histograms, or 2D Box Plots).

Select Integer mode, Unique values, or Categories to specify that method of categorization for each of the variables selected via the Change Variable button, or use the Boundaries, Codes, or Multiple subsets options. For more information about each of these methods of categorization, click on the links below:

Intervals

Use the options in this group box to choose the method of categorization for the selected Grouping variable. Each of the methods is discussed in methods of categorization.

Graph icon

The graph icon in the lower section, left side of the Advanced tab represents the currently selected Graph type (Whiskers, High-Low Close, Columns) and the Middle Point options (see below). It also previews the selected Value (Conf. Interval, Non-outlier range, Min-max, or Constant) that will define the Mean with Error Plot that you are about to create as specified in the Whisker group box.

Middle point

The options in the Middle point group box are used to select the statistic that will be used as middle point in the Means with Error Plots.

Value

Select the statistic - Mean, Median, Mean/Median (uses the Mean as the middle point, plus it has an added marker for the Median), or Median/Mean (uses the Median as the middle point, plus it has an added marker for the Mean) - from the Value drop-down list that will be used to determine the center (middle) points in the plot (variable and group).

Style

Use the Style drop-down list to specify how the middle point should be represented in the Whiskers or High-Low Close plot. You can choose the selected middle point to appear as a line (select Line) or as a point (select Point).

Pooled variance

The Pooled variance check box is available when you select Mean as the Middle point Value. The setting of this check box determines how the standard deviations and standard errors (for the means) are computed from grouped data. When the Pooled Variance check box is selected, STATISTICA computes the pooled within-group (category) variance for all groups (categories), and uses this value as an estimate of σ (Sigma) in computing the standard errors for the means (see, for example, Milliken and Johnson, 1984). Specifically, STATISTICA computes the pooled within-group (category) variance as:

spooled2 = 1/(n-k) * [s12*(n1 -1) + ... + sk2 *(nk -1)]

In this equation, k refers to the k groups in the plot, s12, refers to the variance in the i'th category or group, n1 refers to number of valid observations in the i'th category or group, and n is the overall number of valid observations in the plot.

The standard error of the mean for the i'th group is then computed as:

s.e.(mean) = spooled / square root(ni)

Whisker

The options in the Whisker group box are used to compute the range of Whiskers or High-Low Close, i.e., to define the error ranges.

Value

Use these options to specify how the range of Whiskers or High-Low Close are computed (Std def, Std error, Conf. Interval, Non-outlier range, Min-max, or Constant).

When you select Std dev, the specified constant will be multiplied by the standard deviation of the plot data and added/subtracted from the chosen center point to define the range.

When you select Std error, the specified constant will be multiplied by the standard error of the plot data and added/subtracted from the chosen center point to define the range.

When you select Conf. Interval, the range will be displayed as the confidence interval around the mean value.

When you select Non-outlier range, STATISTICA determines which points in the data are outliers (see Outliers and Extremes), and then uses the highest and lowest data points that are closest to the outliers (but are not outliers) to determine the range in the plot.

Alternatively, the option Min-Max uses the minimum and maximum values of the data to determine the range, without considering whether or not these values are outliers.

When you select Constant, the specified constant will be added/subtracted from the chosen center point (mean or median), to define the range around that center point.

Probability/Coefficient

If you select the Value option (see above) as Conf. Interval, you also need to specify a value between 0.15 and .99 in the Probability edit field. This value will be used to determine the length of the Whiskers or High-Low Close around the Mean value, based on the standard error for the respective means, and the standard normal (z) value associated with the chosen probability. When you select the Value as Non-outlier range or Min-max (see above), you also need to specify a value in the Coefficient edit field by which the selected Value will be multiplied to determine the range. In case of the Value option as Constant, the value of the Coefficient itself determines the range (no multiplier is used). By default the value of the Coefficient is 1.

Connect middle points

Select the Connect middle points check box to connect the selected middle points (Means, Medians, trimmed Means, or trimmed Medians) of the Whiskers or High-Low Close.

Display raw data

Select this check box to display the raw data points.

Jitter

Use the options in this group box to jitter the data points, i.e., modify the original position of the data point from the center of the graph in order to more easily identify/brush overlapping points.

Off

If you select Off, no jitter is applied to the raw data points, outliers, and extremes.

Sequential

If you select Sequential, the jitter is applied sequentially to the raw data points, outliers, and extremes. The jitter is applied such that the first case in the data set is maximally shifted to the left and the last case is shifted maximally to the right.

Random

If you select Random, the data point is randomly shifted within the available range.

Width

With this option, you can specify the maximum jitter width defined as percentage of box width. Possible percentages range from 0 to 250.

Outliers

The Outliers group box is used to control the display of outliers and extremes. Select either Off, Outliers, Extreme, or Outl. & Extremes from the drop-down list. See Outliers and Extremes for additional details on these options.

Coefficient

If you select Outliers, Extreme, or Outl. & Extremes in the Outliers drop-down list, specify a coefficient in the Coefficient edit field to be used to determine the outlier or extreme value range; see Outliers and Extremes for additional details.

Fit

You can fit an equation to the points in the plots by selecting one of the predefined functions.

Linear	Distance Weighted Least Squares
Polynomial	Negative Exponential Weighted
Logarithmic	Spline
Exponential	Lowess

Trim distr. extremes.

Use the Trim distr. extremes box to specify the percent of cases to be "trimmed" from the extremes (i.e., tails) of the distributions of cases for the selected dependent variables. For example, if you specify 10%, then for a variable with 100 cases, STATISTICA removes the 10 cases with the lowest values and the 10 cases with the highest values for the respective variable from the graph, and uses only the 80 remaining ("middle") cases. If you enter a value for Trim distrib. extremes for mean-based Means with Error Plots, then the so-called "trimmed means" will be used in the graph.

Contents

Index

Search Results

Categorized Means with Error Plots - Advanced Tab