Hierarchical clustering
Hierarchical clustering arranges items in a hierarchy with a treelike structure based on the distance or similarity between them. The graphical representation of the resulting hierarchy is a tree-structured graph called a dendrogram. In Spotfire, hierarchical clustering and dendrograms are strongly connected to heat map visualizations. You can cluster both rows and columns in the heat map. Row dendrograms show the distance or similarity between rows, and which nodes each row belongs to as a result of clustering. Column dendrograms show the distance or similarity between the variables (the selected cell value columns).

You can perform hierarchical clustering in two different ways: by using the Hierarchical clustering tool, or by performing hierarchical clustering on an existing heat map visualization. If you use the Hierarchical clustering tool, a heat map with a dendrogram will be created. To learn more about heat maps and dendrograms, see Heat map and Dendrograms and clustering.
Algorithm
The algorithm used for hierarchical clustering in Spotfire is a hierarchical agglomerative method. For row clustering, the cluster analysis begins with each row placed in a separate cluster. Then the distance between all possible combinations of two rows is calculated using a selected distance measure. The two most similar clusters are then grouped together and form a new cluster. In subsequent steps, the distance between the new cluster and all remaining clusters is recalculated using a selected clustering method. The number of clusters is thereby reduced by one in each iteration step. Eventually, all rows are grouped into one large cluster. The order of the rows in a dendrogram are defined by the selected ordering weight. The cluster analysis works the same way for column clustering.
- Performing clustering with the hierarchical clustering tool
The Hierarchical clustering tool groups rows and/or columns in a data table and arranges them in a heat map visualization with a dendrogram (a tree graph) based on the distance or similarity between them. When using the hierarchical clustering tool, the input is a data table, and the result is a heat map with dendrograms. - Distance measures
A number of different measures can be used to calculate the distance or similarity between rows or columns. - Clustering methods
Hierarchical clustering starts by calculating the distance between all possible combinations of two rows or columns using a selected distance measure. These calculated distances are then used to derive the distance between all clusters that are formed from the rows or columns during the clustering. - Ordering weight
The ordering weight controls in what vertical order the rows are displayed in the row dendrogram. For column dendrograms it controls the horizontal order of the columns. The two subclusters within a cluster (there are always exactly two subclusters) are weighted and the cluster with the lower weight is placed above (to the left of) the other cluster. - Hierarchical clustering references
The hierarchical clustering tool in the Spotfire client is built using the following references.
- Performing clustering with the hierarchical clustering tool
The Hierarchical clustering tool groups rows and/or columns in a data table and arranges them in a heat map visualization with a dendrogram (a tree graph) based on the distance or similarity between them. When using the hierarchical clustering tool, the input is a data table, and the result is a heat map with dendrograms. - Distance measures
A number of different measures can be used to calculate the distance or similarity between rows or columns. - Clustering methods
Hierarchical clustering starts by calculating the distance between all possible combinations of two rows or columns using a selected distance measure. These calculated distances are then used to derive the distance between all clusters that are formed from the rows or columns during the clustering. - Ordering weight
The ordering weight controls in what vertical order the rows are displayed in the row dendrogram. For column dendrograms it controls the horizontal order of the columns. The two subclusters within a cluster (there are always exactly two subclusters) are weighted and the cluster with the lower weight is placed above (to the left of) the other cluster. - Hierarchical clustering references
The hierarchical clustering tool in the Spotfire client is built using the following references.