Distinct
This operator returns only distinct combinations of values from specific columns of a database source. Rows are not returned in any particular order, but each combination of values within a row is distinct from other rows.
Information at a Glance
Note: This operator can only be used with TIBCO® Data Virtualization and Apache Spark 3.2 or later.
|
Parameter |
Description |
|---|---|
| Category | Transform |
| Data source type | TIBCO® Data Virtualization |
| Send output to other operators | Yes |
| Data processing tool | TIBCO® DV, Apache Spark 3.2 or later |
Input
An input is a single tabular data set. You can choose the columns from which you want to create distinct combinations, and then the operator performs the calculation.
Bad or Missing Values
The missing values are considered as part of the determination of distinct values. If a column has a missing value, the missing value is considered distinct from a value.
This operator handles null values by eliminating them from the input calculation. To prevent this behavior, use the Null Value Replacement operator
on the initial training data to replace bad or missing values.
Configuration
The following table provides the configuration details for the Distinct operator.
| Parameter | Description |
|---|---|
| Notes | Notes or helpful information about this operator's parameter settings. When you enter content in the Notes field, a yellow asterisk appears on the operator. |
| Distinct Columns | Specify the columns from the data source by which to generate rows of data, where each row has a distinct combination of column values. |
| Output Schema | Specify the schema for the output table or view. |
| Output Table | Specify the table path and name where the output of the results is generated. By default, this is a unique table name based on your user ID, workflow ID, and operator. |
| Store Results | When set to Yes, the operator saves the results. If set to No, the operator does not save the results. |
Output
Visual Output
A table that displays the distinct combinations of values from selected columns of a data set.
Output to Successive operator
A single tabular data set with distinct combinations of values from specific columns.
Example
The following example illustrates the Distinct operator.
Data
golf: This data set contains the following information:
- Multiple columns namely outlook, temperature, wind, humidity, and play.
- Multiple rows (14 rows).
Parameter Setting
The parameter settings for the golf data set are as follows:
-
Distinct Columns: outlook, play
-
Store Results: Yes
Output
The following figure displays the output for the parameter settings for the golf data set.