Distinct (DB)

Returns only distinct combinations of values from specified columns of a database source. Rows are not returned in any particular order, but each combination of values within a row is distinct from other rows.

Information at a Glance

Category Transform
Data source type DB
Sends output to other operators Yes
Data processing tool SQL
Note: The Distinct (DB) operator is for database data only. For Hadoop data, use the Distinct (HD) operator.

Input

A database source. Users choose the columns from which they want distinct combinations, and the operator performs the calculation.

Bad or Missing Values
Missing values are considered as part of determination of distinct values. If a column has a missing value, a missing value is considered distinct from a value.
This operator handles null values by eliminating them from the input calculation. To prevent this behavior, use the Null Value Replacement operator on the initial training data to replace bad or missing values.

Configuration

Parameter Description
Notes Any notes or helpful information about this operator's parameter settings. When you enter content in the Notes field, a yellow asterisk is displayed on the operator.
Distinct Columns

*required

Select one or more columns from the data source by which to generate rows of data, where each row has a distinct combination of column values.
Output Type
  • TABLE outputs a database table. Specifying TABLE enables Storage Parameters.
  • VIEW outputs a database view.
Output Schema The schema for the output table or view.
Output Table The table path and name where the results are output. By default, this is a unique table name based on your user ID, workflow ID, and operator.
Drop If Exists Specifies whether to overwrite an existing table.
  • Yes - If a table with the name exists, it is dropped before storing the results.
  • No - If a table with the name exists, the results window shows an error message.

Output

Data Output
A subset of data with only selected columns, and each row only distinct combinations of values in those columns.