Distance measures
A number of different measures can be used to calculate the distance or similarity between rows or columns.
The following measures are available for the Hierarchical clustering tool:
- Correlation
- Cosine correlation
- Tanimoto coefficient
- Euclidean distance
- City block distance
- Square Euclidean distance
- Half Square Euclidean distance
The term dimension is used in all distance measures. The concept of dimension is simple if we are describing the physical position of a point in three dimensional space when the positions on the x, y and z axes refer to the different dimensions of the point. However, the data in a dimension can be of any type. If, for example, you describe a group of people by their height, their age and their nationality, then this is also a three dimensional system. For a row (or column), the number of dimensions is equal to the number of variables in the row (or column).
- Correlation
Correlation is a common similarity measure when doing hierarchical clustering. - Cosine correlation
Cosine correlation is a common similarity measure when doing hierarchical clustering. - Tanimoto coefficient
The Tanimoto coefficient is a common similarity measure when doing hierarchical clustering. - Euclidean distance
The Euclidean distance is a common distance measure when doing hierarchical clustering. - City block distance
The City block distance is a common distance measure when doing hierarchical clustering. - Square Euclidean distance and Half Square Euclidean distance
Two common distance measures for calculating similarities when doing hierarchical clustering are the Square Euclidean distance and the Half Square Euclidean distance.