Variable (HD)

Use to define variables created from data fields of the input data set, forming a new table or view.

Information at a Glance

Category Transform
Data source type HD
Sends output to other operators Yes
Data processing tool Pig
Note: The Variable (HD) operator is for Hadoop data only. For database data, use the Variable (DB) operator.
Important: The created variables are static in nature. They cannot dynamically change during runtime.

The Variable operator also allows users to divide the data rows into quantiles, adding quantile variables to the data. Dividing the data into such smaller and smaller divisions (quantiles) provides an understanding of the overall data distribution patterns.

Input

An operator that can output a data set.

Configuration

Parameter Description
Notes Any notes or helpful information about this operator's parameter settings. When you enter content in the Notes field, a yellow asterisk is displayed on the operator.
Variables Define the expression(s) to create the new Variable column(s).

For details, see Define Variables dialog box and Define Quantile Variables dialog box.

Quantile Variables If the new variable to create is a quantile variable, select the required column(s) to use for deriving the quantiles.

The possible quantile types are Average Ascend (which automatically creates the bins) and Customize (which manually defines the variable bins).

Columns See Select Columns dialog box.
Store Results? Specifies whether to store the results.
  • true - results are stored.
  • false - the data set is passed to the next operator without storing.
Results Location The HDFS directory where the results of the operator are stored. This is the main directory, the subdirectory of which is specified in Results Name. Click Choose File to open the Hadoop File Explorer Dialog Box and browse to the storage location. Do not edit the text directly.
Results Name The name of the file in which to store the results.
Overwrite Specifies whether to delete existing data at that path and file name.
  • Yes - if the path exists, delete that file and save the results.
  • No - Fail if the path already exists.
Storage Format Select the format in which to store the results. The storage format is determined by your type of operator.

Typical formats are Avro, CSV, TSV, or Parquet.

Compression Select the type of compression for the output.
Available Parquet compression options.
  • GZIP
  • Deflate
  • Snappy
  • no compression

Available Avro compression options.

  • Deflate
  • Snappy
  • no compression
Use Spark If Yes (the default), uses Spark to optimize calculation time.
Advanced Spark Settings Automatic Optimization
  • Yes specifies using the default Spark optimization settings.
  • No enables providing customized Spark optimization. Click Edit Settings to customize Spark optimization. See Advanced Settings Dialog Box for more information.

Output

Visual Output
The data rows of the output table or view displayed. The new Variable columns, such as morethan3k in the example below, are displayed.

To see all of the data rows in addition to the derived variables, select all columns for the Columns parameter.



Data Output
A data set of the newly created table or view.

Additional Notes

The Variable operator also provides the following useful functions.