Deduplicating Data

Use the dedup (deduplication) function to check duplicate records.

Note: For the enterprise edition, you have to configure a connection to the TIBCO® Patterns server before using the dedup function.

You can either create a switchable group of columns or select the columns that you want to check duplicates. This example shows how to create a switchable group for deduplicating data.

Procedure

  1. On the toolbar, click Dedup.
  2. From the menu next to Column name, click Create a switchable group, and then select the FirstName and LastName check boxes. Click elsewhere.
    A switchable group named FirstNameLastName is created.
  3. Select the SSN check box.
  4. Ensure the weight values for FirstNameLastName and SSN columns are 1, and then click Run.

Result

The duplicate rows are marked with the icon. Three new columns: dedup_isLead, dedup_group, and dedup_rowIndex are added. The following table lists the details of the dedup results:
Column Name Data Type Hint
dedup_isLead Boolean true: This row is the first found row in the group.

false: This row is not the first found row in the group

dedup_group Integer 0: This row is a unique row.

>0: This row is in a duplicated group.

dedup_rowIndex Integer The value is the original row index.