Join

This operator joins the data set by allowing users to define the input data set such as an alias, output columns, and the join condition.

Join operator

Information at a Glance

Note: This operator can only be used with TIBCO® Data Virtualization and Apache Spark 3.2 or later.

Parameter

Description
Category Transform
Data source type TIBCO® Data Virtualization
Send output to other operators Yes
Data processing tool TIBCO® DV, Apache Spark 3.2 or later

Input

An input requires two or more data sets.

Note: For the current release (7.1.0), a maximum of two data sets are accepted as inputs.

Both tables must be located in the same database. Join does not work on tables located in different databases. See the TIBCO Data Science - Team Studio Operator and Data Source Compatibility for any data source exceptions for the Join operator.

Bad or Missing Values
Null values are not allowed and result in an error.

Configuration

The following table provides the configuration details for the Join operator.

Parameter Description
Notes Notes or helpful information about this operator's parameter settings. When you enter content in the Notes field, a yellow asterisk appears on the operator.
Create Sequence ID Click Yes to create an ID column on the output data set of the Join operator.
Join Conditions Click Define Join Conditions to display the Join Properties dialog. See Join Properties - Database dialog for more information.

For information about creating the Join condition, see Creating a Join condition for a database join.

Output Schema Specify the schema for the output table or view.
Output Table Specify the table path and name where the output of the results is generated. By default, this is a unique table name based on your user ID, workflow ID, and operator.
Store Results When set to Yes, the operator saves the results. If set to No, the operator does not save the results.

Output

Visual Output
  • Output: A table that displays the output of joined data sets.

Example

The following example joins the golf data set and golf-1 data set into a single data set using the Join operator.

Join operator workflow
Data

golf: This data set contains the following information:

  • Multiple columns namely outlook, temperature, wind, humidity, and play.
  • Multiple rows (14 rows).

golf-1: This data set contains the following information:

  • Multiple columns namely outlook, temperature, wind, humidity, and play.
  • Multiple rows (14 rows).

Parameter Setting
The parameter settings for the golf and golf-1 data set are as follows:
  • Create Sequence ID: No

  • Join Conditions: outlook, temperature, humidity, wind

  • Store Results: Yes

Output
The following figure displays the output for the parameter settings for the given data sets.
Join operator Output