K-means clustering
K-means clustering is an algorithm for partitioning a data table into subsets (clusters), in such a way that the members of each cluster are relatively similar.
The K-means clustering in Spotfire is based on a line chart visualization which has been configured either so that each line corresponds to one row in the root view of the data table, or, if the line chart is aggregated, so that there is a one-to-many mapping between lines and rows in the root view. The clustering is initialized using data centroid based search, using unit weights, and correlation or Euclidean distance as the distance measure. The clustering is always performed on filtered rows. If you wish all rows to be included in the clustering you must reset all filters prior to clustering. The columns the clustering operation should be based on are specified in the line chart that is used as starting point.
References:
Mirkin, B. (1996) Mathematical Classification and Clustering, Nonconvex Optimization and Its Applications Volume 11, Pardalos, P. and Horst, R., editors, Kluwer Academic Publishers, The Netherlands.
MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In Le Cam, L. M. and Neyman, J., editors, Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability. Volume I: Statistics, pages 281-297. University of California Press, Berkeley and Los Angeles, CA.
Hair, J.F.Jr., Anderson, R.E., Tatham, R.L., Black, W.C. (1995) Multivariate Data Analysis, Fourth Edition, Prentice Hall, Englewood Cliffs, New Jersey.
- Performing K-means clustering
The K-means clustering tool is used to group the lines in a line chart into different groups (clusters) of similar lines.
- Performing K-means clustering
The K-means clustering tool is used to group the lines in a line chart into different groups (clusters) of similar lines.