Distance Measures Overview


The following measures can be used to calculate the distance or similarity between rows or columns:

The term dimension is used in all distance measures. The concept of dimension is simple if we are describing the physical position of a point in three dimensional space when the positions on the x, y and z axes refer to the different dimensions of the point. However, the data in a dimension can be of any type. If, for example, you describe a group of people by their height, their age and their nationality, then this is also a three dimensional system. For a row (or column), the number of dimensions is equal to the number of variables in the row (or column).

Note: The result from a cluster calculation will be presented either as the similarity between the clustered rows or columns, or as the distance between them. Euclidean distance, City block distance, Square Euclidean distance, and Half square Euclidean distance will present the distance between the rows or columns. The results from Correlation, Cosine correlation, and Tanimoto coefficient, on the other hand, are presented as similarity between the rows or columns.

Note: When used in clustering, the similarity measures Correlation, Cosine correlation, and Tanimoto coefficient may be transformed so that they are always greater than or equal to zero (using 1 – similarity value).

See also:

Overview of Hierarchical Clustering Theory