Variable (HD)

Information at a Glance

Parameter	Description
Category	Transform
Data source type	HD
Send output to other operators	Yes
Data processing tool	Pig

Note: The Variable (HD) operator is for Hadoop data only. For database data, use the Variable (DB) operator.

Important: The created variables are static in nature. They cannot dynamically change during runtime.

The Variable operator also allows users to divide the data rows into quantiles, adding quantile variables to the data. Dividing the data into such smaller and smaller divisions (quantiles) provides an understanding of the overall data distribution patterns.

Input

An operator that can output a data set.

Configuration

Parameter	Description
Notes	Notes or helpful information about this operator's parameter settings. When you enter content in the Notes field, a yellow asterisk appears on the operator.
Variables	Define the expression(s) to create the new Variable column(s). For details, see Define Variables dialog and Define Quantile Variables dialog.
Quantile Variables	If the new variable to create is a quantile variable, select the required column(s) to use for deriving the quantiles. The possible quantile types are Average Ascend (which automatically creates the bins) and Customize (which manually defines the variable bins).
Columns	See Select Columns dialog.

Store Results?	Specifies whether to store the results. true - results are stored. false - the data set is passed to the next operator without storing.
Results Location	The HDFS directory where the results of the operator are stored. This is the main directory, the subdirectory of which is specified in Results Name. Click Choose File to open the Hadoop File Explorer dialog and browse to the storage location. Do not edit the text directly.
Results Name	The name of the file in which to store the results.
Overwrite	Specifies whether to delete existing data at that path and file name. Yes - if the path exists, delete that file and save the results. No - Fail if the path already exists.

Storage Format

Select the format in which to store the results. The storage format is determined by your type of operator.

Typical formats are Avro, CSV, TSV, or Parquet.

Compression

Select the type of compression for the output.

Available Parquet compression options.

GZIP
Deflate
Snappy
no compression

Available Avro compression options.

Deflate
Snappy
no compression

Use Spark

If Yes (the default), uses Spark to optimize calculation time.

Advanced Spark Settings Automatic Optimization

Yes specifies using the default Spark optimization settings.
No enables providing customized Spark optimization. Click Edit Settings to customize Spark optimization. See Advanced Settings dialog for more information.

Output

Visual Output

The data rows of the output table or view displayed. The new Variable columns, such as morethan3k in the example below, are displayed.

To see all of the data rows in addition to the derived variables, select all columns for the Columns parameter.

Data Output

A data set of the newly created table or view.

Additional Notes

The Variable operator also provides the following useful functions.

Parse data fields stored in a key-value pair format, such as JSON, dictionaries, and database STRUCT formats. For details, see Key-Value Pairs Parsing Example using the Variable Operator.
Convert datetime formats. For more information, see datetime Format Conversion Examples.