Variables
|
To display the
standard variable selection dialog box, click the
Variables button. Note that STATISTICA interprets the selected variables as dimensions if
Cases (rows) is selected in the
Cluster box
(see below); if
Variables (Columns) is selected in the
Cluster box, the selected variables will be interpreted as objects.
|
Input File
|
The Input file box contains two options:
Raw data
and
Distance matrix.
Raw data: If you select
Raw data, STATISTICA expects a standard raw data file as input.
Distance matrix:
If you select
Distance matrix, the input matrix may either be a correlation matrix or a distance (dissimilarity) matrix with numbers indicating the distances or dissimilarities between objects. STATISTICA will automatically determine the contents of the matrix (that is, whether it contains correlations or dissimilarities, see Matrix file format). If the input matrix is a correlation matrix (which indicates the similarity and closeness between objects), it is converted to distances before the analysis begins; specifically, all correlations are transformed as 1-Pearson r.
Note that if your Input file consists of correlation coefficients only (for example, from a published source), and no means, standard deviations, or N is available, you may simply assume standardized data (mean = 0, standard deviation = 1) and an N of, for example, 100 (N must be greater than the number of variables in the analysis). You will first need to add these four cases (means, standard deviation, cases and matrix) to your spreadsheet before you can run the analysis. Of course, in the results, the descriptive statistics for each variable are not meaningful in that case, however, the cluster analysis can be performed based on the correlation coefficients alone.
|
Cluster
|
The
Cluster box contains two options:
Variables (columns) and
Cases (rows). The option you select determines how STATISTICA interprets the selected Variables. Note that the
Cluster box is only available if
Raw data is selected as the
Input file.
Variables (columns):
If
Variables (columns) is selected, STATISTICA interprets the selected Variables (see above) as objects.
Cases (rows):
If
Cases (rows) is selected, STATISTICA interprets the selected Variables as dimensions.
|
Amalgamation (linkage) rule
|
There are seven different amalgamation rules available in the
Amalgamation (linkage) rule
box:
- Single Linkage
- Complete Linkage
- Unweighted pair-group average
- Unweighted pair-group centroid
- Weighted pair-group average
- Weighted pair-group centroid (median)
- Ward's method
One of the main parameters that guides the joining (tree-clustering) process is the linkage rule, that is, the rule that determines when two clusters are to be joined (linked or amalgamated). For a detailed description of amalgamation rules, see Joining (Tree Clustering) Introductory Overview - Amalgamation or Linkage Rules.
|
Distance measure
|
There are seven different distance measures that can be computed from Raw data: Squared Euclidean distances, Euclidean distances, City-block (Manhattan) distances, Chebychev distance metric, Power: SUM(ABS(x-y)p)1/r, Percent disagreement, and 1-Pearson r.
- Squared Euclidean distances
- Euclidean distances
- City-block (Manhattan) distances
- Chebychev distance metric
- Power: SUM(ABS(x-y)p)1/r
- Percent disagreement
- 1-Pearson r
The joining algorithm starts by first computing a matrix of distances between the objects that are to be clustered. For a detailed description of these distances, refer to Joining (Tree Clustering) Introductory Overview - Distance Measures.
If Distance matrix is selected as the Input file, then Dissimilarities from matrix is automatically selected in the Distance measure box. If the input matrix is a correlation matrix, then the correlations (which denote the degree of similarity) will be transformed to dissimilarities (1 - r).
|
Power distance parameters
|
If the Power distances option is selected in the Distance measure box, specify the two parameters p and r for the power distance in these boxes.
|
Batch processing and reporting
|
If you select the Batch processing and
reporting check box, STATISTICA automatically performs the analysis (after you click the
OK button) and sends the entire output from the analysis to a workbook, individual windows, and/or to a report (depending to the options selected in the Analysis/Graph Output Manager).
|