A matching operation compares data from two or more records to find potential duplicates. You create a Matching policy to tell the add-on:
when to execute a matching operation. You can require users to manually execute a matching operation. Alternatively, a matching operation can run automatically when users create or update records.
how to execute a matching operation. You use a decision tree to provide the add-on with criteria that determines whether records are duplicates. You can also apply a merge policy that takes action based on the results of the record evaluation. Please note that creation of a merge policy is covered in Task 4: Configuring a merge policy.
What's next? Follow the instructions in Task 3a to configure a matching policy.
You create a matching policy under and apply it to a table registered with the add-on. When creating the matching policy you tell the add-on when to execute. Users can run matching manually from a table menu, or you can specify that matching executes inline during record creation or update. When matching executes inline, a matching policy automatically executes after creating or modifying the record. If the matching operation finds potentially duplicate (suspect) records, users are notified and can review and make changes to the record.
To create a matching policy (skip to step 2 if picking up from Task 1):
From the main menu bar, select the Administration panel and open Data Quality & Analytics > TIBCO EBX® Match and Merge Add-on and open the table settings you want to edit.
Select the Matching policies tab.
Select the '+' icon to create a new matching policy and fill in the required fields on the page that displays. The image below highlights required options:
1) Matching policy code | The unique code used to identify this matching policy. |
2) Active | Sets whether this matching policy is active. |
3) Execute matching | The Execute matching group determines how matching executes during creation or update of records. If set to:
NoteIf you set matching execution to After submission, Before creation or Inline, you must set this matching policy as the table's Default matching policy. This option is available on the configuration's Main tab. Additionally, when a user creates or updates a record, the add-on will only execute automatically if a field configured for matching is modified. |
4) Execute purge | Determines whether the add-on automatically removes old matching results. If set to Automatically the add-on executes this policy each time matching executes. As this process can be time consuming, you can set it to Manually and allow users to execute as needed. They can access this functionality from a table's Actions menu by selecting: Match and Merge > Purge old results. |
5) Merge policy | Tells the add-on which merge policy to link to this matching policy. After creating a merge policy, you must specify it here in order to activate the automatic merge feature. See Task 4: Configuring a merge policy for instructions on creating a merge policy. |
After saving, the following tabs display:
Decision tree: Contains settings that determine how the add-on compares data values.
Matching fields: Contains configuration settings for each field used in a matching policy.
What's Next? Use the instructions in Task 3b: Creating a decision tree to create a matching decision tree to specify how this matching policy compares data values.
You create a matching decision tree to tell the add-on which data values to compare and how to compare them. Based on the comparison result you can specify that the add-on set a record's relationship status, or perform additional tests to ensure accurate results. A relationship status identifies records as positive or negative matches, or as suspected of being a match. The add-on creates groups of records identified as positive matches.
To create a decision tree:
Select the Decision tree tab.
As shown below, 1) drag a Data comparison test from the Toolbox and 2) drag the bottom connection point from the Start node and link it with the test.
Double-click the test to edit its functions:
The Name property sets a name for this test that displays on the tree.
Evaluation method: the add-on uses the selected method to compare the fields you add. When you choose an option from the menu, you can select the preview icon to view a detailed description. The following comparison methods are available:
All fields match: For this data comparison to evaluate as true, each included field must meet or exceed its minimum score setting. See the definition of Minimum score in step 3 for more information.
At least N fields match: This option allows you to specify that a certain number of included fields must meet or exceed their minimum score setting for this data comparison to evaluate as true. When you select this option, the Minimum number of matches property displays and allows you specify how many fields must match.
Average: An average score is computed from the score of each included field. If this score meets or exceeds the value set in Minimum score, this data comparison evaluates to true. This function also allows you to assign a weight to each field. When using multiple fields as part of one data comparison, the Weight property is used to compute a weighted average and assign a weight to each field. Acceptable values are between 0 and 100. The higher the value, the more weight the field carries. You might set a higher weight on a field that could determine the output earlier in the decision tree. If you set the weight to 0, this field is excluded from the matching pre-processing phase.
By default, during the pre-processing phase of matching, field weights are automatically balanced so that each field equally affects the final score. This means that fields that contain multiple values have the same impact on the final matching score as fields with single values. To disable this behavior, set the ebx.clustering.balance
property to false
in the ebx.properties
file. See Understanding matching operation processing for more information.
To improve matching performance, set the weight to 0 for fields that have a limited number of options. For example, fields that are boolean or enumerations.
At least one or more fields should have a weight above 0 defined. If all fields are set to 0, matching results are not reliable.
Predicate: This option is deprecated. Instead, in the Compare using field, select Operator and the Equals option.
Fields (One or more fields used in the comparison):
The Use relationships menu allows you to specify whether this field is: a regular field, a relationship through a link table, or a foreign key relationship. Before a related field is available to select, you must configure it in the Matching fields tab. See Matching with relationships for more information.
The Compare using option allows you to choose whether to use an algorithm or an operator when comparing the selected field's values. When you select:
Algorithm: Use the Algorithm menu to select the algorithm used to compute a score for this field. The list of available algorithms is filtered depending on the search strategy specified for this field in the data model. In some cases, an algorithm might be available but not recommended for use with a particular search strategy. These algorithms display in a lighter color font at the bottom of the list. If you select an algorithm that is not compatible with the defined search strategy, a warning icon displays. When configuring the algorithm for an added field using the Matching fields tab (described below), a link displays to allow you to change the search strategy defined for this field. After changing the strategy, the list of available algorithms automatically updates.
When you select the:
Exact algorithm, you can to specify whether to use case sensitivity on fields with String and Text data types.
If you select a foreign key field, you should only select the Exact algorithm.
Hybrid fuzzy algorithm, you can adjust parameters related to several of the criteria used in the algorithm. The higher you set the value for a particular criterion, the more it will count when calculating the final similarity score. See Matching algorithms for a more in-depth description of this algorithm.
Operator: Use the Operator menu to select the operator used to compare values in the selected field. If you select Equals, a field displays that allows you to enter a value that must be present in the field to consider it a match.
The operators available to select depend on the data type of the chosen field. If the data type is:
date
, time
, dateTime
, decimal
, text
, string
, integer
, boolean
, or email
; the available options are: Equals, Either null, Both null, and Both not null.
any other data type, the available options are: Either null, Both null, and Both not null.
If you use any operator that checks for null values, set the weight for the matching field to 0 in the Matching fields tab.
Minimum score: Sets the minimum percentage score required to consider data values from this field as matching. Acceptable values are between 1 and 100. The higher you set this value, the more similar the data values must be to match.
You cannot select computed, multi-value, or linked fields in the Fields menu.
Use the matching policy's Matching fields tab to access additional functionality related to fields, including the exclusion of records from a matching operation. See Excluding records for more information. Please note that the Search strategy option in the Matching fields tab must use the same type of strategy (phonetic or distance) defined in the comparison node's Algorithm property.
Specify the relationship status if the test evaluates to true by dragging one of the following from the Toolbox and connecting it with the test's right output point:
Match: the records will be considered a match.
Suspect: the records are suspected of matching, but require human review.
No Match: the records do not match.
Specify the relationship status if the test evaluates to false by connecting the desired status to the test's left output point. Note that if you want the result to trigger a new test, add another Data comparison node and connect it to desired output.
Use the process shown in the previous steps to build out your matching tree. Once finished, select Save and close.
The following example shows using three comparisons. If the first comparison evaluates to true, the records are considered suspects from that point on. If the second and third comparisons also evaluate to true, the records are marked as matches.
If required, set the configuration's default matching policy:
From the Matching policies tab:
Select the policy you want to set as the default.
From the Actions menu, select Set as default.
Save your changes.
From the Main tab:
In the Default matching policy menu select the policy you just created.
Save your changes.
What's next? See Task 4: Configuring a merge policy for instructions on setting up the add-on to merge data.