Unpivot (HD)

Unpivots one or more columns.

Information at a Glance

Category Transform
Data source type HD
Sends output to other operators Yes
Data processing tool Spark
Note: The Unpivot (HD) operator is for Hadoop data only. For database data, use the Unpivot (DB) operator.

The columns selected are removed from the input and flattened into the following two new columns at the end of the output data set.

  • The first column, whose values are the names of the chosen columns.
  • The second column, whose values are the corresponding values in the chosen columns.

Input

A data set from HDFS.

Configuration

Parameter Description
Notes Any notes or helpful information about this operator's parameter settings. When you enter content in the Notes field, a yellow asterisk is displayed on the operator.
Columns The columns to unpivot. All data types are supported.
Name of Variable Column The name of the first new column. This contains the names of the columns to unpivot.
Note: The value must be alphanumeric. (regular expression to match is : "^[A-Za-z]+ \\ w*$")
Name of Value Column The name of the second new column. This contains the values of the columns to unpivot.
Note: The value must be alphanumeric. (regular expression to match is : "^[A-Za-z]+ \\ w*$")
Storage Format Select the format in which to store the results. The storage format is determined by your type of operator.

Typical formats are Avro, CSV, TSV, or Parquet.

Compression Select the type of compression for the output.
Available Parquet compression options.
  • GZIP
  • Deflate
  • Snappy
  • no compression

Available Avro compression options.

  • Deflate
  • Snappy
  • no compression
Output Directory The location to store the output files.
Output Name The name to contain the results.
Overwrite Output Specifies whether to delete existing data at that path.
  • Yes - if the path exists, delete that file and save the results.
  • No - fail if the path already exists.
Advanced Spark Settings Automatic Optimization
  • Yes specifies using the default Spark optimization settings.
  • No enables providing customized Spark optimization. Click Edit Settings to customize Spark optimization. See Advanced Settings Dialog Box for more information.

Output

If you select X columns to unpivot from an input with Y columns and N rows, the output data set has (Y-X+2) columns and (X * N) rows.

Data Output


Note:
  • The New Variable column contains the names of the unpivoted values in chararray format.
  • For the New Value column:
    • If all columns selected to unpivot are numeric, the resulting value column is double.
    • If all columns selected to unpivot are datetime with the exact same format, the resulting value column is datetime with this same format.
    • For all other cases, the resulting value column is chararray.
  • All null values are kept in the output.

Example

Name Mathematics Science English
John 90 70 50
Matt 60 40 80

After you select the Mathematics, Science, and English columns to unpivot, and specify new columns named Subject and Grade, the result is as follows:

Name Subject Grade
John Mathematics 90
John Science 70
John English 50
Matt Mathematics 60
Matt Science 40
Matt English 80