Distinct (DB)

Returns only distinct combinations of values from specified columns of a database source. Rows are not returned in any particular order, but each combination of values within a row is distinct from other rows.

Information at a Glance

Note: You can also use this operator in a workflow that uses TIBCO® Data Virtualization and Apache Spark 3.2 or later.

Parameter

Description
Category Transform
Data source type DB
Send output to other operators Yes
Data processing tool SQL

Note: The Distinct (DB) operator is for database data only. For Hadoop data, use the Distinct (HD) operator.

Input

A database source. Users choose the columns from which they want distinct combinations, and the operator performs the calculation.

Bad or Missing Values
Missing values are considered as part of determination of distinct values. If a column has a missing value, a missing value is considered distinct from a value.
This operator handles null values by eliminating them from the input calculation. To prevent this behavior, use the Null Value Replacement (DB) operator on the initial training data to replace bad or missing values.

Configuration

Parameter Description
Notes Notes or helpful information about this operator's parameter settings. When you enter content in the Notes field, a yellow asterisk appears on the operator.
Distinct Columns

*required

Select one or more columns from the data source by which to generate rows of data, where each row has a distinct combination of column values.
Output Type
  • TABLE outputs a database table. Specifying TABLE enables Storage Parameters.
  • VIEW outputs a database view.
Output Schema The schema for the output table or view.
Output Table Specify the table path and name where the output of the results is generated. By default, this is a unique table name based on your user ID, workflow ID, and operator.
Drop If Exists Specifies whether to overwrite an existing table.
  • Yes - If a table with the name exists, it is dropped before storing the results.
  • No - If a table with the name exists, the results window shows an error message.

Output

Data Output
A subset of data with only selected columns, and each row only distinct combinations of values in those columns.