Variable (HD)
Use to define variables created from data fields of the input data set, forming a new table or view.
Information at a Glance
|
Parameter |
Description |
|---|---|
| Category | Transform |
| Data source type | HD |
| Send output to other operators | Yes |
| Data processing tool | Pig |
The Variable operator also allows users to divide the data rows into quantiles, adding quantile variables to the data. Dividing the data into such smaller and smaller divisions (quantiles) provides an understanding of the overall data distribution patterns.
Input
An operator that can output a data set.
Configuration
| Parameter | Description |
|---|---|
| Notes | Notes or helpful information about this operator's parameter settings. When you enter content in the Notes field, a yellow asterisk appears on the operator. |
| Variables | Define the expression(s) to create the new Variable column(s).
For details, see Define Variables dialog and Define Quantile Variables dialog. |
| Quantile Variables | If the new variable to create is a quantile variable, select the required column(s) to use for deriving the quantiles.
The possible quantile types are Average Ascend (which automatically creates the bins) and Customize (which manually defines the variable bins). |
| Columns | See Select Columns dialog. |
| Store Results? | Specifies whether to store the results.
|
| Results Location | The HDFS directory where the results of the operator are stored. This is the main directory, the subdirectory of which is specified in Results Name. Click Choose File to open the Hadoop File Explorer dialog and browse to the storage location. Do not edit the text directly. |
| Results Name | The name of the file in which to store the results. |
| Overwrite | Specifies whether to delete existing data at that path and file name.
|
| Storage Format | Select the format in which to store the results. The storage format is determined by your type of operator.
Typical formats are Avro, CSV, TSV, or Parquet. |
| Compression | Select the type of compression for the output.
Available Parquet compression options.
Available Avro compression options.
|
| Use Spark | If Yes (the default), uses Spark to optimize calculation time. |
| Advanced Spark Settings Automatic Optimization |
|
Output
To see all of the data rows in addition to the derived variables, select all columns for the Columns parameter.

Additional Notes
The Variable operator also provides the following useful functions.
- Parse data fields stored in a key-value pair format, such as JSON, dictionaries, and database STRUCT formats. For details, see Key-Value Pairs Parsing Example using the Variable Operator.
- Convert
datetimeformats. For more information, see datetime Format Conversion Examples.