Pivot

This operator transforms the categorical data contained in a column of a table into columns of a new table, by means of subtotals (or other calculations) that can be defined by another column in the same list. The other calculations can be averages and counts.

Information at a Glance

Note: This operator can only be used with TIBCO® Data Virtualization and Apache Spark 3.2 or later.

Parameter	Description
Category	Transform
Data source type	TIBCO® Data Virtualization
Send output to other operators	Yes
Data processing tool	TIBCO® DV, Apache Spark 3.2 or later

Algorithm

For typical data entry and storage, data usually appears in flat tables such that it only consists of rows and columns. While such data contains a lot of information, it can be difficult to get the information summary. A pivot table helps quickly summarize the flat data, giving it depth, and highlighting the desired information.

The usage of a pivot table is extremely broad and depends on the situation. The first question to ask is "What am I looking for?" A pivot table usually consists of rows, columns, and data (or fact) fields. These fields allow several kinds of aggregations including sum, average, count, max, min, and so on.

The Pivot Column can only be categorical. In the output table, the Pivot Column is transformed into multiple columns, one for each category. To run this operator, you must define at least one variable with a categorical data type.

The results are also grouped by a selected column.
The values in the new columns are aggregates of a third column (or of the presence of the category if no aggregate column is chosen).
The columns are listed (in the table, file, or array) in a descending order of the value of the category that they represent.

Note: This operator can only connect to subsequent operators in certain situations - see the Output section for details.

Input

An input is a single tabular data set.

Bad or Missing Values

Null values are not allowed and result in an error.

Configuration

The following table provides the configuration details for the Pivot operator.

Parameter	Description
Notes	Notes or helpful information about this operator's parameter settings. When you enter content in the Notes field, a yellow asterisk appears on the operator.
Pivot Column	Specify the column for pivot transformations. On database data sources, you can only select the categorical columns to pivot.
Group By	Specify the column to group by.
Aggregate Column	Specify the pivot column's value column.
Aggregation	Specifies the aggregate function. The following values are available: Sum Average Count Max Min
Output Schema	Specify the schema for the output table or view.
Output Table	Specify the table path and name where the output of the results is generated. By default, this is a unique table name based on your user ID, workflow ID, and operator.
Store Results	When set to Yes, the operator saves the results. If set to No, the operator does not save the results.

Output

Visual Output

Output: A table that displays the output of a data set for the succeeding operator.

Example

The following example demonstrates the Pivot operator.

Pivot workflow

Data

golf: This data set contains the following information:

Multiple columns namely outlook, temperature, wind, humidity, and play.
Multiple rows (14 rows).

Parameter Setting

The parameter settings for the golf data set are as follows:

Pivot Column: play
Group By: outlook
Aggregate Column: temperature
Aggregation: sum
Store Results: Yes

Output

The following figure displays the output for the parameter settings for the golf data set.

Pivot output tab

Did you find this helpful?

Yes No