Detecting Duplicates
Use the Dedup (deduplication) function to detect or check duplicated or similar data in a project or an external table uploaded from TIBCO Patterns.
- For the cloud edition, the connection is automatically established.
- For the enterprise edition, manually establish a connection before using the Dedup function. See Configuring Patterns Server Settings.
Each data column that is to be detected is translated into a querylet. TIBCO Patterns calculates a score for each querylet, and then uses the querylet score of each column to calculate a final score for the compared data rows. The score indicates the degree of similarity of the compared data. A value of 1 indicates the compared data matches exactly.
When checking duplicates against a project or an external table, you have to specify a score threshold (from 0.0 to 1.0) to define the accuracy of the query. If the calculated final score for the rows you are comparing is greater than the score threshold you set, these rows are grouped in the same duplication group.
