Dendrograms and clustering
A dendrogram is a tree-structured graph used in heat maps to visualize the result of a hierarchical clustering calculation. The result of a clustering is presented either as the distance or the similarity between the clustered rows or columns depending on the selected distance measure.
Row dendrograms

The individual rows in the clustered data are represented by the right-most nodes, the leaf nodes, in the row dendrogram. Each node in the dendrogram represents a cluster of all rows that lie to the right of it in the dendrogram. The left-most node in the dendrogram is therefore a cluster that contains all rows. The vertical dotted line is the pruning line, which can be dragged sideways in the dendrogram. The values next to the pruning line indicate the number of clusters starting from the current position of the line, as well as the calculated distance or similarity at that position. In the example above, the calculated distance is 1.59, and there are three clusters starting at the position of the pruning line. The upper two, indicated by pink circles, contain two or more rows, while the lower cluster contains only one individual row.
Column dendrograms

At the position of the pruning line in the above example, there are two clusters. The left-most cluster contains two columns, while the right-most cluster contains only one individual column. The calculated distance is 6.08.
Interacting with the dendrogram

Clustering

You can position the dendrogram on different sides of the
visualization using the visualization properties, as well as make other updates
to the settings. The option
Use log scale changes the scale from a linear
scale to a logarithmic base 10 scale,
log10 (x)
, and
Show pruning line specifies whether to show the
pruning line in the dendrogram. You can also specify the
Pruning line color, the
First alternating cluster color and the
Second alternating cluster color in the
properties.
Importing and exporting dendrograms
All dendrograms in Spotfire can be represented by a data table. This makes it possible to use various clustering methods and statistical calculations, other than those included in the Edit Clustering Settings dialog. For example, you can use data functions to execute a custom made R script, which performs a clustering with a method of your choice. More specifically, you can make use of any calculation that can order leaves in a hierarchical fashion. The result from such a procedure would be a data table, which you can add to the analysis, and then import to the heat map and use to show a dendrogram.
Another reason for exporting a dendrogram to a data table, and later importing it again, is performance. If you have a really large data set, and perform a clustering method on it, the calculations could take some time. If you have run a clustering method once, which is used in a dendrogram, you can export it and later import it without having to run the clustering again.
The data table representation of a dendrogram used in Spotfire must adhere to a certain format. This format is described in Dendrogram data table format.
Concerning R
R is available under separate open source software license terms and is not part of Spotfire. As such, R is not within the scope of your license for Spotfire. R is not supported, maintained, or warranted in any way by Cloud Software Group, Inc. Download and use of R is solely at your own discretion and subject to the free open source license terms applicable to R.
- Dendrogram data table format
A dendrogram can be imported and used in a heat map via a data table, provided that it follows the format specified in this topic.