Distinct (DB)

Returns only distinct combinations of values from specified columns of a database source. Rows are not returned in any particular order, but each combination of values within a row is distinct from other rows.

Information at a Glance

Note: You can also use this operator in a workflow that uses TIBCO® Data Virtualization and Apache Spark 3.2 or later.

Parameter	Description
Category	Transform
Data source type	DB
Send output to other operators	Yes
Data processing tool	SQL

Note: The Distinct (DB) operator is for database data only. For Hadoop data, use the Distinct (HD) operator.

Input

A database source. Users choose the columns from which they want distinct combinations, and the operator performs the calculation.

Bad or Missing Values

Missing values are considered as part of determination of distinct values. If a column has a missing value, a missing value is considered distinct from a value.

This operator handles null values by eliminating them from the input calculation. To prevent this behavior, use the Null Value Replacement (DB) operator on the initial training data to replace bad or missing values.

Configuration

Parameter	Description
Notes	Notes or helpful information about this operator's parameter settings. When you enter content in the Notes field, a yellow asterisk appears on the operator.
Distinct Columns *required	Select one or more columns from the data source by which to generate rows of data, where each row has a distinct combination of column values.

Output Type

TABLE outputs a database table. Specifying TABLE enables Storage Parameters.
VIEW outputs a database view.

Output Schema	The schema for the output table or view.
Output Table	Specify the table path and name where the output of the results is generated. By default, this is a unique table name based on your user ID, workflow ID, and operator.
Drop If Exists	Specifies whether to overwrite an existing table. Yes - If a table with the name exists, it is dropped before storing the results. No - If a table with the name exists, the results window shows an error message.

Output

Data Output

A subset of data with only selected columns, and each row only distinct combinations of values in those columns.

Did you find this helpful?

Yes No