Define Join Conditions Dialog Box (Hadoop)

For Join operations in a Hadoop data source, you can specify conditions for the two sources, including whether to include all records in one or both data sources, and whether to use a Pig join script.

Parameter Description
Columns for Matching Rows from Each Table From each of the datasets, select the matching columns and the conditions for the join.
  • Click Add Condition to add additional rows and conditions.
  • Click Delete to delete a condition.
Join Type
  • Specify when to include rows from each dataset, even if no matching row is found in the other dataset.
  • Select use Pig join script to execute join to revert the defined join to a basic Pig-based join.
    Note: Selecting this option disables the option for performing the replication in memory.
  • Select the dataset for performing replication across nodes. this specifies whether to have the smaller dataset of the join replicated in memory for possible performance improvements.
    Note: This makes sense only if both of the datasets are not large, and saving one in memory provides faster join results.
Output Click the Join to filter and display only the columns selected from Input.
Input Select the dataset to populate the Selected Fields for Output.
Selected Fields for Output File
  • Select output columns: Select the columns from each input table to include on the output table.
  • Alias: You can assign a unique table alias for each input table by clicking and changing a column's alias field.