Workspace Node: K-Means Clustering - Specifications - Advanced Tab

In the K-Means Clustering node dialog box, under the Specifications heading, select the Advanced tab to access the following options.

Element Name	Description
Variables	Click this button to display a variable selection dialog box, in which you select variables for the analysis.
Cluster	The Cluster box contains two options: Variables (columns) and Cases (rows). The option you select determines how Statistica interprets the selected Variables (see above).
Variables (columns)	If Variables (Columns) is selected, Statistica interprets the selected Variables as objects.
Cases (rows)	If Cases (rows) is selected, Statistica interprets the selected Variables as dimensions.
Number of clusters	Enter the desired number of clusters, which must be greater than 1 and less than the number of objects (i.e., cases or variables depending on the selection in the Cluster box, see above).The purpose of the k-means clustering procedure is to classify objects into a user-specified number of clusters. The algorithm will move objects into different clusters with the goal of minimizing the within-cluster variability while maximizing the between-cluster variability. For a further discussion of this method, refer also to the Cluster Analysis Overview.
Number of iterations	Specify the maximum number of iterations that can be performed. k-means clustering is an iterative procedure; in each iteration, objects are moved into different clusters. The algorithm implemented in the Cluster Analysis module is very efficient, and the default setting (10 iterations) usually does not need to be changed.
Initial cluster centers	This group box contains three options (described below). Use these options to specify the way in which the initial cluster centers are computed. Note that the results from the k-means clustering method depend to some extent on the initial configuration (i.e., cluster means or centers). This is particularly the case when there are many small clusters (with few objects) that are clearly distinct.
Choose observations to maximize initial between-cluster distances	When you select this option button, observations or objects will be set as the initial cluster centers; the choice of the object follows rules to maximize the initial cluster distances. Specifically, 1) the program will select the first N (number of clusters) cases to be the respective cluster centers; 2) subsequent cases will replace previous cluster centers if their smallest distance to any of the cluster centers is larger than the smallest distance between clusters; if this is not the case, then 3) subsequent cases will replace initial cluster centers if their smallest distance from a cluster center is larger the distance of that cluster center from any other cluster center. The effect of this selection procedure is to maximize the initial distances between clusters. Note that this procedure may yield clusters with single observations if there are clear outliers in the data.
Sort distances and take observations at constant intervals	When you select this option button, the distances between all objects will first be sorted, and then objects at constant intervals will be chosen as initial cluster centers.
Choose the first N (Number of clusters) observations	When you select this option button, the first N (number of clusters) observations will be the initial cluster centers. Thus, this option provides full control over the choice of the initial configuration. This is often useful if you bring a priori expectations regarding the nature of the clusters to the analysis. In that case, move the cases that you want to choose as the initial cluster centers to the beginning of the file. Options / C / W. See Common Options.
OK	Click the OK button to accept all the specifications made in the dialog box and to close it. The analysis results will be placed in the Reporting Documents node after running (updating) the project.

Copyright © 2021. Cloud Software Group, Inc. All Rights Reserved.