Set Operations

This operator combines the results from merging two or more queries into a single result set.

Set operations operator icon

Information at a Glance

Note: This operator can only be used with TIBCO® Data Virtualization and Apache Spark 3.2 or later.

Parameter

Description
Category Transform
Data source type TIBCO® Data Virtualization
Send output to other operators Yes
Data processing tool TIBCO® DV, Apache Spark 3.2 or later

Input

There must be two or more tabular data sets used as inputs.

Note: For the current release (7.1.0), a maximum of two data sets are accepted as inputs.

Restrictions

  • The number of columns must be the same in all queries.
  • The data types must be compatible.

Configuration

The following table provides the configuration details for the Set Operations operator.

Parameter Description
Notes Notes or helpful information about this operator's parameter settings. When you enter content in the Notes field, a yellow asterisk appears on the operator.
Sets Click Define Sets to display the Define Sets dialog. For more information, see the Define Sets dialog.
Output Schema Specify the schema for the output table or view.
Output Table Specify the table path and name where the output of the results is generated. By default, this is a unique table name based on your user ID, workflow ID, and operator.
Store Results When set to Yes, the operator saves the results. If set to No, the operator does not save the results.

Output

Visual Output
A table that displays the data rows of the output table, limited by the maximum display of rows and columns.
Data Output
A tabular data set that displays the joined data sets.

Example

The following example demonstrates the Set Operations operator where it combines the results of the queries into a single result set. Here, it includes all the distinct rows that belong to all queries in the union.

Set Operations operator workflow
Data
Credit: This data set contains the following information:
  • Multiple columns namely id, times90dayslate, revolving_util, debt_ratio, credit_lines, monthly_income, times30dayslate_2years, and srsdlqncy.
  • Multiple rows (50,000 rows).
credit data set
demographics: This data set contains the following information:
  • Multiple columns namely ID, AGE_IN_YEARS, LEVEL_OF_EDUCATION, YEARS_WITH_CURRENT_EMPLOYER, and YEARS_AT_CURRENT_ADDRESS.
  • Multiple rows (850 rows).
demographics data set
Parameter Setting
The parameter settings for the Set Operations operator are as follows:
  • Sets: UNION

    Alias credit demographics
    NEW_ID id ID
  • Store Results: Yes

Output
The following figure displays the output for the Set Operations operator. The result displays all unique identifiers (IDs) in both used datasets.
Set Operations operator output