Unstack

Information at a Glance

Category	Transform
Data source type	HD
Sends output to other operators	Yes
Data processing tool	Spark SQL

The operation of Unstack is similar to a pivot operation, except that it is capable of operating on text data as well as numeric data, whereas a pivot operation works on numeric data only. Unstack takes multiple Columns to Keep (group by), a single column that contains the column names (pivot column), and a single column that contains the values of the new columns in the output data set (aggregate column) as inputs.

The maximum number of new columns that can be produced is 10,000. If the pivot column contains more than 10,000 distinct values, an error results.

Missing data appears as null values in the output data set. Unstack handles all column types as inputs except date/time and Boolean.

Input

A single tabular data set.

Configuration

Parameter	Description
Notes	Any notes or helpful information about this operator's parameter settings. When you enter content in the Notes field, a yellow asterisk is displayed on the operator.
New Column Names	A column name that contains the new column header.
New Column Values	A column name that contains the values.
Aggregation Method	The method that aggregates the new values.
Columns to Keep	Any number of non-date/time and Boolean columns.
Storage Format	Select the format in which to store the results. The storage format is determined by your type of operator. Typical formats are Avro, CSV, TSV, or Parquet.
Compression	Select the type of compression for the output. Available Parquet compression options. GZIP Deflate Snappy no compression Available Avro compression options. Deflate Snappy no compression
Output Directory	The location to store the output files.
Output Name	The name to contain the results.
Overwrite Output	Specifies whether to delete existing data at that path. Yes - if the path exists, delete that file and save the results. No - fail if the path already exists.
Advanced Spark Settings Automatic Optimization	Yes specifies using the default Spark optimization settings. No enables providing customized Spark optimization. Click Edit Settings to customize Spark optimization. See Advanced Settings Dialog Box for more information.

Output

Visual Output: A tabular preview of the output data set. Summary output includes parameters selected, input data size, and output location.
Data Output: A single tabular data set.
Note: Unstack is a 'semi-terminal' operator. A partial schema can be transmitted to subsequent operators at design time, but you must run the operator for subsequent operators to see the final output schema.

Example

In the following illustration, name and date are the row identifiers, while header and value are the columns to pivot. To complete the operation, Unstack pivots the text values (text value 1 and text value 2), and then places the values in the proper row. Where name equals cc, there is only one date (date4) and two headers. This results in one row. Where name equals bb, there are three dates, each with one header. This results in three rows. Columns in which the header values do not exist for the dates are left empty.

Contents