Spotfire® User Guide

Performing clustering with the hierarchical clustering tool

The Hierarchical clustering tool groups rows and/or columns in a data table and arranges them in a heat map visualization with a dendrogram (a tree graph) based on the distance or similarity between them. When using the hierarchical clustering tool, the input is a data table, and the result is a heat map with dendrograms.

About this task

See also Hierarchical clustering.

You can also initiate hierarchical clustering on an existing heat map from the Dendrograms section of the heat map visualization properties. See Heat map to learn more.

Before you begin

Hierarchical clustering must be authored in the installed client.

Procedure

  1. On the menu bar, select Tools > Hierarchical clustering.
  2. If the analysis contains more than one data table, select a Data table to perform the clustering calculation on.
  3. Click Select Columns.
  4. In the Select Columns dialog, add the columns you want to include in the clustering, and then click OK to close the dialog.
  5. Select the Cluster rows check box if you want to create a row dendrogram.
  6. Click the Settings button to edit the clustering settings.
  7. In the Edit Clustering Settings dialog, select a Clustering method.
    For more information on clustering methods, see Clustering methods.
    Note: If you select Ward's method as the clustering method, the distance measure will automatically be set to Half square Euclidean distance. No other distance measure can be used with Ward's method.
  8. Select a Distance measure.
    For more information on distance measures, see Distance measures. Distances exceeding 3.40282e+038 cannot be represented.
  9. Select the Ordering weight to use in the clustering calculation.
    For more information see Ordering weight.
  10. Select an empty value replacement Method from the drop-down list:
    OptionDescription
    Constant value Replaces the empty value by a constant number that you can specify yourself.
    Column average Returns the average of the corresponding column values. If the column contains only empty values, they will be replaced by 0, because it is not possible to calculate an average.
    Row averageReplaces the value by the average value of the entire row. If the row contains only empty values, they will be replaced by 0, because it is not possible to calculate an average.
    Row interpolationSets the missing value to the interpolated value between the two neighboring values in the row.
  11. Select a normalization Method to use in the clustering calculation.
    For more information, see Normalizing columns. If you normalize by percentile you must also specify a percentage.
  12. Click OK to close the Edit Clustering Settings dialog.
  13. Select the Cluster columns check box if you want to create a column dendrogram.
  14. Go through steps 6 to 12 to define settings for the column dendrogram.
  15. Click OK.

Results

The hierarchical clustering calculation is performed, and a heat map visualization with the specified dendrograms is created. A cluster column is also added to the data table and made available in the filters panel. See Dendrograms and clustering to learn more about dendrograms and cluster columns.