Set Operations (HD)
Combines results from merging two or more queries into a single result set.
Information at a Glance
|
Parameter |
Description |
|---|---|
| Category | Transform |
| Data source type | DB |
| Send output to other operators | Yes |
| Data processing tool | MapReduce / Spark |
Note: The Set Operations (HD) operator is for Hadoop data only. For database data, use the
Set Operations (DB) operator.
Input
Two or more databases.
Restrictions
- The number and the order of the columns must be the same in all queries.
- The data types must be compatible.
Configuration
| Parameter | Description |
|---|---|
| Notes | Notes or helpful information about this operator's parameter settings. When you enter content in the Notes field, a yellow asterisk appears on the operator. |
| Sets | Click Define Sets to display the Define Sets dialog. For more information, see the Define Sets dialog. |
| Store Results? | Specifies whether to store the results.
|
| Results Location | The HDFS directory where the results of the operator are stored. This is the main directory, the sub-directory of which is specified in Results Name. Click Choose File to open the Hadoop File Explorer dialog and browse to the storage location. Do not edit the text directly. |
| Results Name | The name of the file in which to store the results. |
| Overwrite | Specifies whether to delete existing data at that path and file name.
|
| Storage Format | Select the format in which to store the results. The storage format is determined by your type of operator.
Typical formats are Avro, CSV, TSV, or Parquet. |
| Compression | Select the type of compression for the output.
Available Parquet compression options.
Available Avro compression options.
|
| Use Spark | If Yes (the default), uses Spark to optimize calculation time. |
| Advanced Spark Settings Automatic Optimization |
|
Output
Visual Output
The data rows of the output table/view displayed (up to 200 rows).
Data Output
A data set of the joined data sets. This operator always creates a CSV output.