Distinct (DB)
Returns only distinct combinations of values from specified columns of a database source. Rows are not returned in any particular order, but each combination of values within a row is distinct from other rows.
Information at a Glance
Note: You can also use this operator in a workflow that uses TIBCO® Data Virtualization and Apache Spark 3.2 or later.
|
Parameter |
Description |
|---|---|
| Category | Transform |
| Data source type | DB |
| Send output to other operators | Yes |
| Data processing tool | SQL |
Note: The Distinct (DB) operator is for database data only. For Hadoop data, use the
Distinct (HD) operator.
Input
A database source. Users choose the columns from which they want distinct combinations, and the operator performs the calculation.
Bad or Missing Values
Missing values are considered as part of determination of distinct values. If a column has a missing value, a missing value is considered distinct from a value.
This operator handles null values by eliminating them from the input calculation. To prevent this behavior, use the
Null Value Replacement (DB) operator on the initial training data to replace bad or missing values.
Configuration
| Parameter | Description |
|---|---|
| Notes | Notes or helpful information about this operator's parameter settings. When you enter content in the Notes field, a yellow asterisk appears on the operator. |
| Distinct Columns
*required |
Select one or more columns from the data source by which to generate rows of data, where each row has a distinct combination of column values. |
| Output Type |
|
| Output Schema | The schema for the output table or view. |
| Output Table | Specify the table path and name where the output of the results is generated. By default, this is a unique table name based on your user ID, workflow ID, and operator. |
| Drop If Exists | Specifies whether to overwrite an existing table.
|
Output
Data Output
A subset of data with only selected columns, and each row only distinct combinations of values in those columns.
