Transpose
Allows you to rearrange data so that rows and columns are switched.
Information at a Glance
Category | Transform |
Data source type | HD |
Sends output to other operators | Yes1 |
Data processing tool | Spark |
You can choose which input column should be used to define the new header. If the input has X columns and Y rows, the output has Y rows and X columns.
In the following example, the Name column is selected to be the output header.
After Transpose is run with Name as the header column, the data set looks like the following example.
Input
A data set from HDFS to this operator. At least one categorical column is necessary to define the new header.
Restrictions
This operator cannot transpose an input larger than 5,000 rows. If the input has a single column and you select this column to be the new header, an error occurs while the operator is being configured.
Configuration
Parameter | Description |
---|---|
Notes | Any notes or helpful information about this operator's parameter settings. When you enter content in the Notes field, a yellow asterisk is displayed on the operator. |
Column for New Header |
You can, as an option, select a categorical (chararray) column whose name and values define the new header. If no column is selected, the header in the output is default (Column1, Column2...ColumnX)
Note: If the selected column contains null or duplicate values, the job fails at runtime with a meaningful error message.
If some values contain non-alphanumeric characters, they are replaced by an underscore in the new header. If some values start with a non-letter character, the letter "a" is prepended to match the column name regex "^[A-Za-z]+ \\ w*$". |
New Name for First Column |
Optional new name for the first column in the output, matching the regular expression "^[A-Za-z]+ \\ w*$".
If you do not want to specify a name, keep the default empty box. |
Storage Format | Select the format in which to store the results. The storage format is determined by your type of operator.
Typical formats are Avro, CSV, TSV, or Parquet. |
Compression | Select the type of compression for the output.
Available Avro compression options. |
Output Directory | The location to store the output files. |
Output Name | The name to contain the results. |
Overwrite Output | Specifies whether to delete existing data at that path. |
Advanced Spark Settings Automatic Optimization |
|
Output
- Visual Output
- Data Output
- This is a semi-terminal operator that can be connected to any subsequent operator at design time, but does not transmit the full output schema until the user runs the operator. The partial output schema at design time is only be the first column of the output. After running it, the output schema is automatically updated and subsequent operators turn red in case the UI parameters selection is not valid anymore.
Note: The final output schema of the Transpose operator is cleared if one of the following events occurs.
- The user changes the configuration properties of the Transpose operator.
- The user changes the input connected to the Transpose operator.
- The user clears the step run results of the Transpose operator.
In this case, the output schema transmitted to subsequent operators again becomes the partial schema defined at design time (hence, subsequent operators can turn invalid), and the user must run the Transpose operator again to transmit the new output schema.