Unpivot (HD)
Unpivots one or more columns.
Information at a Glance
|
Parameter |
Description |
|---|---|
| Category | Transform |
| Data source type | HD |
| Send output to other operators | Yes |
| Data processing tool | Spark |
The columns selected are removed from the input and flattened into the following two new columns at the end of the output data set.
- The first column, whose values are the names of the chosen columns.
- The second column, whose values are the corresponding values in the chosen columns.
Input
A data set from HDFS.
Configuration
| Parameter | Description |
|---|---|
| Notes | Notes or helpful information about this operator's parameter settings. When you enter content in the Notes field, a yellow asterisk appears on the operator. |
| Columns | The columns to unpivot. All data types are supported. |
| Name of Variable Column | The name of the first new column. This contains the names of the columns to unpivot.
Note: The value must be alphanumeric. (regular expression to match is : "^[A-Za-z]+ \\ w*$")
|
| Name of Value Column | The name of the second new column. This contains the values of the columns to unpivot.
Note: The value must be alphanumeric. (regular expression to match is : "^[A-Za-z]+ \\ w*$")
|
| Storage Format | Select the format in which to store the results. The storage format is determined by your type of operator.
Typical formats are Avro, CSV, TSV, or Parquet. |
| Compression | Select the type of compression for the output.
Available Parquet compression options.
Available Avro compression options.
|
| Output Directory | The location to store the output files. |
| Output Name | The name to contain the results. |
| Overwrite Output | Specifies whether to delete existing data at that path.
|
| Advanced Spark Settings Automatic Optimization |
|
Output
If you select X columns to unpivot from an input with Y columns and N rows, the output data set has (Y-X+2) columns and (X * N) rows.
- The New Variable column contains the names of the unpivoted values in chararray format.
- For the New Value column:
- If all columns selected to unpivot are numeric, the resulting value column is double.
- If all columns selected to unpivot are datetime with the exact same format, the resulting value column is datetime with this same format.
- For all other cases, the resulting value column is chararray.
- All null values are kept in the output.
Example
| Name | Mathematics | Science | English |
|---|---|---|---|
| John | 90 | 70 | 50 |
| Matt | 60 | 40 | 80 |
After you select the Mathematics, Science, and English columns to unpivot, and specify new columns named Subject and Grade, the result is as follows:
| Name | Subject | Grade |
|---|---|---|
| John | Mathematics | 90 |
| John | Science | 70 |
| John | English | 50 |
| Matt | Mathematics | 60 |
| Matt | Science | 40 |
| Matt | English | 80 |