TIBCO Cloud™ Spotfire® User Guide

Dendrograms and clustering

A dendrogram is a tree-structured graph used in heat maps to visualize the result of a hierarchical clustering calculation. The result of a clustering is presented either as the distance or the similarity between the clustered rows or columns depending on the selected distance measure.

Note: Dendrograms must be authored in the installed client.
See Distance measures and the detailed description for each measure for further information about the available distance measures. You can perform hierarchical clustering on an existing heat map by opening the Dendrograms section of the Visualization properties dialog in the installed client. You can also use the Hierarchical clustering tool to cluster with a data table as the input. To learn more about hierarchical clustering and heat maps, see Hierarchical clustering and Heat map respectively. Note that only numeric columns will be included when clustering.
Tip: Dendrograms can be exported or imported from the Dendrograms page in the Visualization Properties dialog.
Note: To make the dendrogram visible the first time, or after any clustering settings have been changed, it must be updated. In the visualization properties, click the Update button or choose automatic updates.

Row dendrograms

The row dendrogram shows the distance or similarity between rows and which nodes each row belongs to, as a result of clustering. An example of a row dendrogram is shown below.

The individual rows in the clustered data are represented by the right-most nodes, the leaf nodes, in the row dendrogram. Each node in the dendrogram represents a cluster of all rows that lie to the right of it in the dendrogram. The left-most node in the dendrogram is therefore a cluster that contains all rows. The vertical dotted line is the pruning line, which can be dragged sideways in the dendrogram. The values next to the pruning line indicate the number of clusters starting from the current position of the line, as well as the calculated distance or similarity at that position. In the example above, the calculated distance is 1.59, and there are three clusters starting at the position of the pruning line. The upper two, indicated by pink circles, contain two or more rows, while the lower cluster contains only one individual row.

Column dendrograms

The column dendrogram is drawn in the same way as the row dendrogram, but shows the distance or similarity between the variables (the cell value columns).

At the position of the pruning line in the above example, there are two clusters. The left-most cluster contains two columns, while the right-most cluster contains only one individual column. The calculated distance is 6.08.

Interacting with the dendrogram

The dendrogram makes it easy to highlight and mark in the heat map. You can hover over the dendrogram to highlight clusters and their corresponding cells in the heat map. You can click to mark a cluster. This will also mark the corresponding cells in the heat map, as in the example below. The tooltip shows information about the cluster.

Clustering

As mentioned, a dendrogram is added to the heat map when clustering is performed. A new column is also added to the data table, and made available as a filter. The cluster column is dynamic, and the position of the pruning line decides its content. The example below shows what the cluster column and cluster filter would look like for the row dendrogram above.

The cluster column contains unique identifiers for the cluster nodes corresponding to the position of the pruning line. In the example above, two cluster nodes are identified. Test B, Test C, and Test F belong to the cluster node with identifier 3, while Test A and Test E belong to the cluster node with identifier 5. The third identifier, *6, is a leaf node, containing Test D. The cluster column makes it possible to filter out entire clusters at a time. You can also use it to color or trellis other visualizations by.
Note: If you add a column dendrogram to a heat map that is configured with multiple cell value columns, then the cluster column cannot show any cluster IDs. This means that the cluster column cannot be used for filtering, or to color or trellis other visualizations by. Also, the column dendrogram will not be fully interactive. For instance, it might not be possible to use the dendrogram to highlight or mark in the heat map. However, you can still move the pruning line to see the calculated distance or similarity, as well as the number of clusters.

You can position the dendrogram on different sides of the visualization using the visualization properties, as well as make other updates to the settings. The option Use log scale changes the scale from a linear scale to a logarithmic base 10 scale, log10 (x), and Show pruning line specifies whether to show the pruning line in the dendrogram. You can also specify the Pruning line color, the First alternating cluster color and the Second alternating cluster color in the properties.

Importing and exporting dendrograms

All dendrograms in Spotfire can be represented by a data table. This makes it possible to use various clustering methods and statistical calculations, other than those included in the Edit Clustering Settings dialog. For example, you can use data functions to execute a custom made R script, which performs a clustering with a method of your choice. More specifically, you can make use of any calculation that can order leaves in a hierarchical fashion. The result from such a procedure would be a data table, which you can add to the analysis, and then import to the heat map and use to show a dendrogram.

You can also export a dendrogram from a heat map, view the resulting data table, make modifications, and import it back to the heat map - in effect modifying the dendrogram.
Tip: To export this data table to use outside of Spotfire, use File > Export > Data to file, and select to export the data table you just created.

Another reason for exporting a dendrogram to a data table, and later importing it again, is performance. If you have a really large data set, and perform a clustering method on it, the calculations could take some time. If you have run a clustering method once, which is used in a dendrogram, you can export it and later import it without having to run the clustering again.

The data table representation of a dendrogram used in Spotfire must adhere to a certain format. This format is described in Dendrogram data table format.

Concerning R

R is available under separate open source software license terms and is not part of Spotfire. As such, R is not within the scope of your license for Spotfire. R is not supported, maintained, or warranted in any way by Cloud Software Group, Inc. Download and use of R is solely at your own discretion and subject to the free open source license terms applicable to R.