Understanding matching operation processing

Overview
Adjusting parameter settings
Pre-processing optimizations reduce cluster size

Overview

The add-on and the EBX® core share responsibility for matching process execution. The core product pre-processes the list of records to match to find similar records and passes them to the add-on. EBX® finds similar records by taking the base record, or record to match against other records, and applying certain parameter settings.

If a field configured to participate in matching meets either of the following criteria, it is automatically disabled during the pre-processing phase to improve performance:

The field's type is boolean.
The entire column is null.

When input consists of a selection of records, the output is a group of similar records. If the input is an entire table, additional processing is performed and the output is pairs of similar records. The add-on processes the output from EBX® using the logic defined in a matching policy's decision tree to determine whether values match. The diagram below visualizes this process:

Adjusting parameter settings

Adjusting parameter settings changes the criteria used by EBX® to find similar records. As a result, the output records passed to the add-on for processing can change. Fine-tune the settings to obtain desired results. However, keep in mind that it is wise to find a balance between the trade-off of restrictive settings vs nonrestrictive. Criteria that is too strict can result in not passing records to the add-on that a decision tree would consider a match.

To edit parameter settings:

Access the configuration settings for the matching field you want to update:
1. Navigate to Administration > TIBCO EBX® Match and Merge Add-on > Table activation and settings.
2. Open the table configuration and matching policy containing the field.
3. Select the Matching fields tab and open the field.
Use the following options under Pre-processing to change parameter settings:
1. Search strategy: Sets the search strategy used to group similar records during the pre-clustering process. The list only includes search strategies defined in the data model containing the selected field. See the EBX® Reference Manual for more information about search strategies.
2. Weight: Specifies the weight assigned to this field during pre-processing by EBX®. When a matching operation includes multiple fields, the weight sets the relative importance of each field's score in determining whether records are similar. Note that this weight does not affect the weighted average in the decision tree comparison nodes.
3. Null value management: Determines the behavior when comparing two records and one or both of them has a Null value in this field. When enabling matching of null values, adjust the weight for the field so that it is lower as compared to other fields. Otherwise, it might not get included during pre-processing with other similar records.
Save and close to keep your changes.

Pre-processing optimizations reduce cluster size

Where possible the pre-processing phase automatically reduces the cluster size to optimize performance. During this phase, the decision tree is analyzed as follows:

The system identifies any fields determined as "mandatory". These are fields whose similarity is required to consider records as matches or suspects. Any records without similarity in these fields are excluded from further processing, which can significantly improve performance by reducing the number of candidate records. For example, the Customer_Email field is considered as mandatory in a decision tree where the field is included in all paths that lead to a MATCH or SUSPECT output. In this case, records that do not have the same email address will never be considered as candidates, and are not passed to the decision tree for evaluation.
Currently, this behavior only applies to data comparison nodes that use the All fields match evaluation and the following algorithms: Exact, or Full text with a minimum score of 100%.
The system analyses each decision tree path that leads to a MATCH or SUSPECT output to identify groups of fields where similarity is essential for records to match. A minimum required score is computed from the smallest group of fields. Records must meet, or exceed this score. Otherwise, they are not passed to the decision tree for evaluation. Take for instance, a decision tree with five fields configured with the same weight, and at least three of the fields must be similar for records to match. In this case, the minimum similarity score is 60% to pass candidate records to the decision tree.

TIBCO EBX® Version 6.2.1. Copyright © 2001-2025. Cloud Software Group, Inc. All rights reserved.

All third party product and company names and third party marks mentioned in this document are the property of their respective owners and are mentioned for identification.