Transforming data
Sometimes, the data you want to analyze in Spotfire is not in the most appropriate format, or can even contain errors. It can therefore be useful to modify the data to get the best results from the analysis. Transformations can be added to all in-memory data sources using the installed client. When you work with data from data connections, you can only transform data if it is imported into Spotfire. In-database data from connections (external data), however, cannot be transformed.
Before you begin
About this task
There are also several methods that can be used to transform your data, either when adding a data table, or, as a data transformation step after a data source or data table is added. These transformations are available in the installed client only:
- Calculate and replace column – replace a column in the data table with a calculated column.
- Calculate new column – add a calculated column to the data table.
- Change column names – change the name of one or more of the columns in the data table.
- Change data types – change the data type for one or more of the columns in the data table.
- Data function – use a previously registered data function as a transformation step.
- Exclude columns – exclude one or more of the columns from the data table.
- Filter rows – exclude all rows not matching a boolean
expression.
Tip: You can easily create a filter rows transformation directly from a filter. See Creating a filter transformation.
- Normalization – normalize the data prior to addition of the data table.
- Pivot data – pivot the data, that is, change the data table from a tall/skinny format to a short/wide format.
- Replace specific value – change the value on a specific row in the data table to a different value.
- Replace value – change all instances of a value in a column to a different value.
- Unpivot data – unpivot the data, that is, change the data table from a short/wide format to a tall/skinny format.
- Additional transformations might be available to you if these have been added locally.
You can perform transformations on most of the "regular" column types that are loaded into Spotfire, but not on certain column types whose content changes depending on selections you make in the analysis. Calculated columns, columns created by adding tags to the analysis, and columns created using tools like K-means clustering and Line similarity are some examples of column types that you cannot apply transformations to. Columns that cannot be transformed will not be available for selection in any of the settings dialogs used for transformations.
Applying transformations on the source or afterwards?
Transformations can be applied either upon data loading, or later on, when the data has already been loaded into Spotfire. Having the transformation directly on the data source is generally only necessary if the original data is very large and the transformation itself reduces or remodels the data to load; for example, if an unpivot operation is required to reduce the number of columns. Most of the time, transformations benefit from being added as a separate step in the data history, on the already loaded data. When the transformation group is added as a separate step, you can easily edit or remove one or more transformations without needing to reload the data.
- Editing transformations
If a previously added transformation is no longer working as it should, or, if the transformation parameters must be updated, you can edit the transformation in the data canvas of the installed Spotfire client. - Calculate and replace column
The Calculate and Replace Column transformation is used to calculate a new column and replace a column in a data table with the new, calculated column. - Calculate new column
The Calculate New Column transformation is used to add a new calculated column to a data table. - Change column names
The Change Column Names transformation is used to change the names of one or more of the columns in a data table. - Change data types
The Change Data Types transformation is used to change the data types of one or more of the columns in a data table. - Data functions - transformation
The Data functions - transformation transformation is used when you want to use a data function from the library as a transformation. Data functions can be used as transformations if they use a single data table as input and produce a single data table as output. - Exclude columns
The Exclude Columns transformation is used to exclude one or more columns from a data table. - Filter rows
The Filter rows transformation is used to exclude rows based on a boolean expression. Only those rows matching the expression (the rows for which the expression is True) will be included in the data table. - Normalization
The Normalization transformation is used to normalize the data in one or more columns of a data table. - Pivot data
The Pivot Data transformation is used to transform data from a tall/skinny format to a short/wide format. - Replace specific value
The Replace Specific Value transformation is used to edit a previously added transformation from the data canvas. It can also be used if you must add a replace value transformation to an earlier stage in the data table history (not to the final data table). - Replace value
The Replace Value transformation is used to edit a previously added transformation from the data canvas. It can also be used if you must add a replace value transformation to an earlier stage in the data table history (not to the final data table). - Unpivot data
The Unpivot Data transformation is used to transform data from a short/wide format to a tall/skinny format. When you use the Unpivot data dialog for the transformation, you can immediately set the desired names for the new category and value columns. - Preview
The Preview dialog lets you see what the data will look like after the selected transformation has been applied.
Applying transformations when loading data
When using the installed client, you can add transformations while you add the data to be transformed to the analysis, during the data loading configuration.
About this task
Procedure
Applying transformations to data already loaded into the analysis
If you discover that your current data table needs some transformation before you can use your data in a visualization, you can add a transformation to imported data at any time. By following the steps here, the transformation will be added to the currently last node (as shown in the data canvas). Should you discover that a transformation is needed at an earlier step in your data history (for example, if you need to add a transformation before an already existing Added columns operation), then follow the steps to add transformations to a specific node instead.
Procedure
Adding transformations to a specific node in the source view
If it is important to control exactly when the transformation is applied in relation to where rows or columns are added to the data table, you can add the transformations anywhere in the source structure from the Data canvas.
Procedure
Results
See also Editing transformations, Removing operations in the data canvas and Replacing a value.