Contents

Index

Search Results

Sort By Multiple Columns

Allows you to choose up to three columns to sort by and returns a data set sorted by the selected column(s), adding a column called row_index that enables you to filter the output based on the sorting results.

Information at a Glance

Category	Transform
Data source type	HD
Sends output to other operators	Yes
Data processing tool	Spark

Input

A tabular data set from HDFS.

Configuration

Parameter	Description
Notes	Any notes or helpful information about this operator's parameter settings. When you enter content in the Notes field, a yellow asterisk is displayed on the operator.
Primary Sort Column	First column to sort by. While Secondary Sort Column and Tertiary Sort Column can be left blank, this column is required.
Primary Column Sort Order	Order by which to sort the first column: Ascending (the default) or Descending.
Secondary Sort Column	Second column to sort by. To sort by one column only, leave this column and the Tertiary Sort Column blank.
Secondary Column Sort Order	Order by which to sort the second column: Ascending (the default) or Descending.
Tertiary Sort Column	Third column to sort by. To sort two columns only, leave this one blank.
Tertiary Column Sort Order	Order by which to sort the third column: Ascending (the default) or Descending.
Create 'row_index' Column	Specify whether to add the row_index column, which adds an extra column to the data set that shows the sort index. Default value: No.
Write Rows Removed Due to Null Data to File	Rows with null values (only in the columns selected to sort by) are removed from the analysis. This parameter allows you to specify that the data with null values be written to a file. The file is written to the same directory as the rest of the output. The filename is suffixed with _baddata. Do Not Write Null Rows to File (the default) - remove null value data and display in the result UI, but do not write to an external file. Do Not Write or Count Null Rows (Fastest) - remove null value data but do not count and display in the result UI. Write All Null Rows to File - remove null value data and write all removed rows to an external file.
Storage Format	Select the format in which to store the results. The storage format is determined by your type of operator. Typical formats are Avro, CSV, TSV, or Parquet.
Compression	Select the type of compression for the output. Available Parquet compression options. GZIP Deflate Snappy no compression Available Avro compression options. Deflate Snappy no compression
Output Directory	The location to store the output files.
Output Name	The name to contain the results.
Overwrite Output	Specifies whether to delete existing data at that path. Yes - if the path exists, delete that file and save the results. No - fail if the path already exists.
Advanced Spark Settings Automatic Optimization	Yes specifies using the default Spark optimization settings. No enables providing customized Spark optimization. Click Edit Settings to customize Spark optimization. See Advanced Settings Dialog Box for more information.

Output

Visual Output: The following example is sorted by age, then income.
Data Output: A data set that contains the sorted columns and the extra row_index column if selected.

Copyright © Cloud Software Group, Inc. All rights reserved.