Join (HD)
Performs a table join on the input data sets by allowing users to define the input data set alias, the output columns, and the join condition.
Information at a Glance
|
Parameter |
Description |
|---|---|
| Category | Transform |
| Data source type | HD |
| Send output to other operators | Yes |
| Data processing tool | Pig, MapReduce |
Note: The Join (HD) operator is for Hadoop data only. For database data, use the
Join (DB) operator.
Input
Accepts two data sets.
Configuration
| Parameter | Description |
|---|---|
| Notes | Notes or helpful information about this operator's parameter settings. When you enter content in the Notes field, a yellow asterisk appears on the operator. |
| Join Conditions | Click
Define Join Conditions to select the appropriate columns for joining the data and to specify the join type and output fields.
For more information, see Define Join Conditions dialog (Hadoop). |
| Store Results? | Specifies whether to store the results.
|
| Results Location | The HDFS directory where the results of the operator are stored. This is the main directory, the subdirectory of which is specified in Results Name. Click Choose File to open the Hadoop File Explorer dialog and browse to the storage location. Do not edit the text directly. |
| Results Name | The name of the file in which to store the results. |
| Overwrite | Specifies whether to delete existing data at that path and file name.
|
| Storage Format | Select the format in which to store the results. The storage format is determined by your type of operator.
Typical formats are Avro, CSV, TSV, or Parquet. |
| Compression | Select the type of compression for the output.
Available Parquet compression options.
Available Avro compression options.
|
| Use Spark | If Yes (the default), uses Spark to optimize calculation time. |
| Advanced Spark Settings Automatic Optimization |
|
Output
Visual Output
The resulting joined data rows of the output table or view are displayed in the
Results window (up to 200 rows of the data):

Data Output
A data set of the output table or view of the joined data set.