Reorder Columns (HD)

Reorders one or more columns from an input table, and optionally renames them.

Information at a Glance

Category Transform
Data source type HD
Sends output to other operators Yes
Data processing tool Spark
Note: The Reorder Columns (HD) operator is for Hadoop data only. For database data, use the Reorder Columns (DB) operator.

Input

A tabular data set.

Configuration

Notes Any notes or helpful information about this operator's parameter settings. When you enter content in the Notes field, a yellow asterisk is displayed on the operator.
Ordered Columns Click Define to specify the columns (in order) to become the first columns in the output, and optionally specify a new name for each. See Ordered Columns Dialog Box for more information.
Columns to Keep Specify any other columns to keep in the output.
Storage Format Select the format in which to store the results. The storage format is determined by your type of operator.

Typical formats are Avro, CSV, TSV, or Parquet.

Compression Select the type of compression for the output.
Available Parquet compression options.
  • GZIP
  • Deflate
  • Snappy
  • no compression

Available Avro compression options.

  • Deflate
  • Snappy
  • no compression
Output Directory The location to store the output files.
Output Name The name to contain the results.
Overwrite Output Specifies whether to delete existing data at that path.
  • Yes - if the path exists, delete that file and save the results.
  • No - fail if the path already exists.
Advanced Spark Settings Automatic Optimization
  • Yes specifies using the default Spark optimization settings.
  • No enables providing customized Spark optimization. Click Edit Settings to customize Spark optimization. See Advanced Settings Dialog Box for more information.

Outputs

Visual Output
Output (Preview of the output data set):



Summary:



Data Output
A tabular data set with reordered and (optionally) renamed columns.