Columns for Matching Rows from Each Table
|
From each of the datasets, select the matching columns and the conditions for the join.
- Click
Add Condition to add additional rows and conditions.
- Click
Delete to delete a condition.
|
Join Type
|
- Specify when to include rows from each dataset, even if no matching row is found in the other dataset.
- Select
use Pig join script to execute join to revert the defined join to a basic Pig-based join.
Note: Selecting this option disables the option for performing the replication in memory.
- Select the dataset for performing replication across nodes. this specifies whether to have the smaller dataset of the join replicated in memory for possible performance improvements.
Note: This makes sense only if both of the datasets are not large, and saving one in memory provides faster join results.
|
Output
|
Click the
Join to filter and display only the columns selected from
Input.
|
Input
|
Select the dataset to populate the
Selected Fields for Output.
|
Selected Fields for Output File
|
- Select output columns: Select the columns from each input table to include on the output table.
- Alias: You can assign a unique table alias for each input table by clicking and changing a column's alias field.
|