Set Operations
This operator combines the results from merging two or more queries into a single result set.
Information at a Glance
Note: This operator can only be used with TIBCO® Data Virtualization and Apache Spark 3.2 or later.
|
Parameter |
Description |
|---|---|
| Category | Transform |
| Data source type | TIBCO® Data Virtualization |
| Send output to other operators | Yes |
| Data processing tool | TIBCO® DV, Apache Spark 3.2 or later |
Input
There must be two or more tabular data sets used as inputs.
Note: For the current release (7.1.0), a maximum of two data sets are accepted as inputs.
Restrictions
- The number of columns must be the same in all queries.
- The data types must be compatible.
Configuration
The following table provides the configuration details for the Set Operations operator.
| Parameter | Description |
|---|---|
| Notes | Notes or helpful information about this operator's parameter settings. When you enter content in the Notes field, a yellow asterisk appears on the operator. |
| Sets | Click Define Sets to display the Define Sets dialog. For more information, see the Define Sets dialog. |
| Output Schema | Specify the schema for the output table or view. |
| Output Table | Specify the table path and name where the output of the results is generated. By default, this is a unique table name based on your user ID, workflow ID, and operator. |
| Store Results | When set to Yes, the operator saves the results. If set to No, the operator does not save the results. |
Output
Visual Output
A table that displays the data rows of the output table, limited by the maximum display of rows and columns.
Data Output
A tabular data set that displays the joined data sets.
Example
The following example demonstrates the Set Operations operator where it combines the results of the queries into a single result set. Here, it includes all the distinct rows that belong to all queries in the union.
Data
Credit: This data set contains the following information:
- Multiple columns namely id, times90dayslate, revolving_util, debt_ratio, credit_lines, monthly_income, times30dayslate_2years, and srsdlqncy.
- Multiple rows (50,000 rows).
demographics: This data set contains the following information:
- Multiple columns namely ID, AGE_IN_YEARS, LEVEL_OF_EDUCATION, YEARS_WITH_CURRENT_EMPLOYER, and YEARS_AT_CURRENT_ADDRESS.
- Multiple rows (850 rows).
Parameter Setting
The parameter settings for the Set Operations operator are as follows:
-
Sets: UNION
Alias credit demographics NEW_ID id ID -
Store Results: Yes
Output
The following figure displays the output for the Set Operations operator. The result displays all unique identifiers (IDs) in both used datasets.