TIBCO EBX®
Match and Merge Documentation > User Guide
Navigation modeMatch and Merge Documentation > User Guide

Matching

Introduction

The EBX® Match and Merge Add-on finds records that might be duplicates. You can run it manually and configure it to run automatically during certain operations. You can search within a table and configure it to search related tables. The results include a score that indicates how closely records match. You can resolve matches, or configure the add-on to automatically handle conflicts in different ways based on the score.

Matching configuration

This section describes how to configure add-on functions.

Matching configuration overview

Before using matching, configuration is needed to define:

Note

You can test new or existing configuration settings to ensure they returns the desired results. See Testing a matching policy for more information.

Configuration can be managed by staff that are in a 'Matching Steward' role. During day-to-day data modification, this configuration should not be modified by other staff such as those with 'Data Steward' roles.

The four main levels of matching configuration that require management include: Table configuration, Process policy, Matching policy and Survivorship policy.

A partial view of the logical data model for matching configuration is presented in EBX® Match and Merge Add-on data model section. In this user guide, both the business and logical names of each table are highlighted so that it is easy to refer to the logical data model if needed.

/Matching_configuration_1.png

The TIBCO EBX® Match and Merge Add-on dataset located under the EBX® 'Administration' tab is used to configure matching policies, cleansing policies and crosswalk policies.

The 'Matching reference data' dataset (also located under the EBX® 'Administration' tab) contains reference data used by the add-on. The 'Matching state machine configuration' dataset must not be modified.

Special notation:

/100002010000000E0000000E3C009B93.png

If the matching configuration is not correct at execution time, then any creation or modification of records is set to unmatched. Even though matching configuration quality and integrity are checked when configuring the add-on, it is still possible that some matching use cases are not covered by the configuration. For example, a business context could be raised but not yet be declared in the current matching configuration.

Concepts

EBX® Match and Merge Add-on configuration relies on these key concepts: table declaration, process policy, matching policy and survivorship policy.

/100000000000048B000003269DFA8CE8.png

In the above illustration, the 'Table' declares which tables are under the add-on control. The list of existing tables is obtained by configuring a data model containing the targeted tables. In the figure, the 'PartyName' table is set under add-on control. The localization of this table can evolve to other dataspaces and datasets over time. It will not change this declaration. A table requires a unique declaration regardless of its localization in many dataspaces and datasets.

A process policy defines a add-on process strategy such as: use of the workflow, import mode, inline matching, etc. Multiple strategies can be configured for one table. It is possible to change the strategy by data life cycle contexts based on a dataspace and dataset.

A matching policy defines the add-on algorithms to apply on the fields involved in the matching score computation. Several matching policies can be defined by datasets and dataspace, by business contexts (values of fields in the record to match) and by workflow.

A survivorship policy defines the rules to apply when executing the automatic merge of suspect records into golden or pivot records. They can be contextualized by dataspace and dataset.

Creating a test configuration

Once the add-on metadata is included to your table (refer to Installation and first configuration section), you can create a simple configuration that helps you determine how well the add-on works in your environment.

After testing this initial configuration, you can create another to import your data or initialize an existing set of records (refer to Installation and first configuration section). Then you will be ready to create your own portfolio of matching policies.

This table highlights setting your first test configuration.

Table

Field to configure at minimum

Table

Select the data model and then the table.

Select the your data language.

Other properties do not need to be used or modified for this minimum configuration.

Process policy

Select your table.

Other properties do not need to be used or modified for this minimum configuration.

Your process policy is set up by default to run in a direct matching mode without a workflow.

Matching policy

Select your table.

Set the 'Active' property to 'Yes'.

Select the Main matching algorithm.

Other properties are not used or modified.

Matching field

Select your matching policy and submit.

Select the field you want to use to match your data.

Other properties are not used or modified.

Table 1: A first quick and easy configuration

The Data life cycle context, Matching policy context, Survivorship policy, Survivor field, Source, Table trusted source, Field trusted source tables are not required for your initial test configuration.

Checking the configurations

You can use the 'Check add-on configuration' operation to determine whether a table's configuration is valid. The operation can raise errors.

Checking matching configuration is available on demand. It does not prevent the add-on from executing.

/Checking_The_Configuration.png

At execution time, if any errors exist, then user messages are raised and the default policy is applied: records are set to unmatched for any creation and modification.

/Checking_The_Configurations_2.png

Special notation

/100002010000000E0000000E3C009B93.png

Checking of Matching policies executes if the 'Use matching' ('Table' configuration) property is set to 'Yes'. Checking of Cleansing policies executes if the 'Use cleansing' property is set to 'Yes'.

Table configuration

Use this table to register your tables with the add-on and enable desired features, such as cleansing and crosswalk.

To register a table with the add-on:

Once you save the configuration, other properties display to enable or disable add-on features (Matching, Cleansing, Crosswalk, language, etc.). Please note that these properties only display when the registered table includes DaqaMetaData.

Table table (logical name: TableConfiguration)

Properties

Definition

Code

Any naming convention without white spaces.

Data model

The data model that contains the table to set under the add-on configuration.

Table

Name of table managed by the add-on. When the table has not been extended with the matching metadata type, then only the cleansing and crosswalk features are available.

Business ID

One or more fields used as the business identifier. This identifier is used by the matching policy relying on a literal score = Exact in order to group suspect records that have identical business identifiers (no fuzzy search, see example in Matching policy with exact score section).

Source field

Field in the table that is used for getting the record source.

This parameter is optional and is used when the record selection policy is 'most trusted source'.

A multi-valued field cannot be selected as a source field. To use such a field you need to create a new field with a function that aggregates the values into the accepted format. Then, this field can be used as a source field.

Field to exclude records from match

Field in the table that is used for excluding records when the match operation is executed. This parameter is optional and is used when the matching policy 'Exclude records from matching' property is not null.

Language

Language used by the matching procedure. The language is selected from 'ebx.locales', that is, the languages managed by EBX®.

For international languages, meaning many languages, English must be selected.

Max number of records in cluster

The maximum number of records that can be matched with the pivot in a cluster when matching is executed. Note that this number does not include the record being matched. For example, if you specify a value of 7, there can be 8 records in the cluster.

Set to undefined when the maximum number of records in a cluster is unbounded. Please, note that:

  • When this number is too high (more than 20 records) careful analysis of suspect records can be difficult.

  • When this number is too low (less than 5 records) then relevant suspect records can be rejected by the add-on. This is because only the best suspect records are kept in the cluster. For instance, if this limit is set to 5, then no more than five suspect records will be kept in the cluster even if more suspect records have scores that deem them important.

  • In batch execution mode ('match at once' services) it is better to set the 'Max number of records in cluster' property higher so that the suspect records are more rapidly collected in clusters.

Max number of records for group

The maximum number of records that can be matched with a random record (considered as pivot) in a cluster used by the 'Group at once (unmatched)' and 'Group at once (to be matched)' operations to group records (not including a random record selected by the add-on to). Set to undefined when the maximum number of records in a group is unbounded.

Latest cluster number

The cluster register begins at 11 and is incremented by one to an unbounded level. Every table holds its own cluster register. The 'Latest cluster number' value is used by the add-on to keep in memory the last cluster identifier created for a table. All the datasets that come from the same data model share the same 'Latest cluster number'.

Disable matching trigger

It is possible to deactivate the matching trigger used by the add-on. This trigger allows you to put the table under add-on control. It is embedded in the data type that contains the matching metadata.

The de-activation of the matching trigger is used when it is needed to implement the invocation to the matching in a bespoke trigger (via the add-on's API) that lives together with other add-ons such as the TIBCO EBX® Insight Add-on or any other software invocation at the CRUD time.

If the matching trigger is inactive and the add-on's API not implemented then the table is no longer under the control of the add-on. Every new record will get an undefined state value that will be possible to set up later with the 'Set at once' operation.

If 'Yes': The matching trigger is inactive.

If 'No': The matching trigger is active.

Default value: 'No'

Event listener

Java implementation listening to matching events.

Use matching

If 'Yes': The matching feature is enabled for the selected table.

If 'No': The matching feature is not enabled.

The matching feature finds duplicate records and makes creation of a golden record possible.

When 'Use matching' is set to 'No', created and updated records are set to the Unmatched state in the '000' cluster. Records with Merged/Deleted states cannot be modified. Records that have 'Unmatched' or 'To be matched' states can be modified; however, their state won't change.

Use cleansing

If 'Yes': The cleansing feature is enabled for the selected table.

If 'No': The cleansing feature is not enabled.

The cleansing feature assesses table data quality and fixes defects, such as missing field values.

Use crosswalk

If 'Yes': The crosswalk feature is enabled for the selected table.

If 'No': The crosswalk feature is not enabled.

The crosswalk feature creates cross-reference records by performing a matching process.

Activate Monitoring

If 'Yes': The TIBCO EBX® Activity Monitoring Add-on stores execution status.

If 'No': The EBX® Activity Monitoring Add-on does not store execution status.

Default value: 'No'

Determines whether the TIBCO EBX® Activity Monitoring Add-on stores execution status. This setting applies to 'Match at once', 'Run match', and 'Exact match' operations.

Number of processed records to update status

Specifies the number of records the add-on must process before triggering a status update in the TIBCO EBX® Activity Monitoring Add-on. This value must be between 500 and 2000.

By default, the execution status is updated in the TIBCO EBX® Activity Monitoring Add-on every time 1000 records are processed.

Table 2: Table configuration properties

Process policy

A process policy defines a set of parameters that the add-on uses to execute matching. Multiple process policies can be defined and attached to a table controlled by the add-on. Two important concepts must be understood to configure a process policy: 'double matching' and 'user interaction & workflow'.

Double matching configuration

Every matching policy can be configured with two matching algorithms. The second algorithm is optional. It is used to deal with 'false negative' records that the first algorithm could ignore by error. A 'false negative' record is a record that should have been identified as a suspect record by the matching procedure, but was not. The second algorithm is executed if the score of the first algorithm is between 0% and the value specified by the 'Threshold second algorithm' property. All suspects that are detected by the second matching algorithm are set with a score of 'Stewardship min score' + '0.1' to put them in the right area defined for the first algorithm. Above the stewardship area an automatic record merge-based on survivorship rules-is applied. The second algorithm is not able to run an automatic merge and the score of every suspect is set to 'Stewardship min score' + '0.1'.

User and workflow interaction

The add-on can be configured to launch automatic workflows when a duplicate record is identified. A portfolio of workflow user tasks and scripts is provided to facilitate the design of any bespoke workflows (see Workflow integration section). When the add-on is configured with no automatic workflows, it still remains possible to launch global EBX® workflows outside the scope of the add-on. These workflows can use all user tasks and scripts provided with the add-on. Two modes of matching can be configured: 'Direct matching' to compute suspect records directly (not requiring user input), or 'Simple matching' with the 'Suspicious' state management (requires user input). See Record level matching section for more information. For the 'Simple matching' mode, it is possible to configure the add-on with the 'embedded at submit' property to directly display the duplicate records at submit time without workflow execution. The possible configurations to drive the user interactions and the workflow interaction are highlighted in the figure below.

/10000000000004A1000002750C46543D.png

Here are some common configurations:

/10000000000002F800000179F5773930.png

Table 3: Usual process policy configurations

Note

When creating a workflow, the Auto complete property must be set to false if you want to use the merge process output for the next step.

If the 'On process driven' policy is set to 'Yes' and something is not correct in the workflow configuration, either at the 'Simple matching' level or the 'Workflow by state' level, then the 'Direct matching and no workflow' configuration is used by default.

In addition to these configurations, it is possible to deactivate matching either on creation or modification of any record (see Processes applied to the creation and modification of records section). It is also possible to deactivate survivorship execution (automatic merge).

The services applied to a set of records, such as 'Match at once', do not create workflows no matter what configuration is used.

Description

Application

Example

Direct matching without workflow

Any creation or modification of a record entails a matching execution against the table. The states of the impacted records are immediately updated.

The workflow is not used.

Suitable when cleansing an existing table, importing bulk data and mass data entry.

This method is useful when a company does not need workflows or prefers to manage workflows outside of add-on control (*).

(*) All components of the matching UI (light and full views) can be easily integrated in any workflow, with a data context providing a record or cluster identifier.

Creation of a record that moves to the pivot state in a cluster that contains suspect records.

Creation of a record that becomes a golden record in the '001' reserved cluster.

Simple matching (human decision required when suspect records are detected)

Any creation or modification of a record executes a search for matching records. This search does not entail modification of the impacted record's states that are identified by the add-on. These records are presented to the user who decides to keep the new/updated record as a golden record, to cancel the new/updated record or to manage a merge procedure applied to the matching records. When there are no matching records, then the new/updated record is automatically considered a golden record.

A workflow for creation and modification can be configured.

Suitable in real-time data entry when creation or modification of a record must be matched against the table to determine whether or not it is a suspicious record.

Based on a human decision applied on the suspicious record, either the new/updated record is canceled or confirmed as a golden or a pivot record.

Note: it is possible to create any bespoke workflow with a first step for record creation and modification. Then, the second step is the simple matching view. In this case the workflow properties must be set to none.

Creation of a record that is flagged as a suspicious record. Then a workflow is launched to display all potential suspect records. The user decides how this suspicious record must be managed: golden record directly, cancel, merge with other potential suspect records.

Direct matching with workflow

The creation or modification of a record executes matching on the table. The states of the impacted records are directly updated. Depending on the state of the new/updated record a workflow can be configured. The add-on executes this workflow automatically.

Suitable in the same use case as 'Direct matching without workflow'.

Automatic creation of a workflow when a record moves to the pivot state.

Table 4: On process driven configuration

Processes applied to the creation and modification of records

The configuration of a process policy allows you to define different options applied to the creation and modification of records. The properties used are as follows: 'On matching process', 'On creation', 'On modification', 'Forced golden creation', 'Is import mode', 'Is import mode only' and 'On simple matching'.

/10000000000002DA000002C554E735A9.png

/1000000000000302000002CE6C1439CC.png

The following, shows common use contexts and their corresponding configurations:

/10000000000003680000013E76F4F062.png

Table 5: Example of processes applied to the creation and modification of records

Process policy configuration

A table's configuration must declare one active process policy with no context. When several process policies are active at the same time for different contexts, then the policy with no context is used by default for every execution that is not related to a defined context.

Process policy table (logical name: ProcessPolicy)

Properties

Definition

Code

Any naming convention without white spaces can be used.

Short description

The policy description.

Long description

Long description of the policy

Table

Link to the table used for this policy.

Active

If 'Yes': This process policy is used.

If 'No': This process policy is not used.

This field indicates whether or not the policy is used when running matches. Since the add-on manages the process policy context, one or more process policies can be active at the same time for a given table. To disable a process policy, set the active field to 'No'.

Only one process policy can be active for each context and a table.

One active process policy with no context must be configured. It is used by default process policy at execution time.

Default value: 'Yes'

Data life cycle context

A process policy can be defined for a specific dataspace and/or dataset. This property is a link to a data life cycle context (dataspace and/or dataset).

Leave the value undefined to deactivate the use of a context (any dataspace and dataset).

The priority of contexts is as follows:

  • dataspace and dataset

  • dataspace

  • No context

/100000000000017F000000CA699BAC74.png

In this example, the process policy P003 has the highest priority, then process policy P002 and P001.

On matching process (See use case in this user guide to understand the way of the add-on works depending on this parameter.)

If 'Yes': The table benefits from matching in the context defined by the 'Data life cycle context' property (any data space and data set if 'Data life cycle context' is undefined).

If 'No': Matching is inactive for the table in the context. The states of created records and modified records without a state are still systematically set to 'Unmatched'.

Default value: 'Yes'

On creation

If True: Matching is run at record creation.

If No: Matching is not run at record creation. The state of created records is systematically set to unmatched.

Default value: 'Yes'.

Search before create

When you enable Search before create, users can search existing records to avoid creating a potential duplicate.

On modification

If 'Yes': Matching is run at record modification. For Simple matching process (creation of suspicious records), the matching is run when matching fields are modified only. For all other cases, the matching is run when the record is modified whatever the fields.

If 'No': Matching is not run at record modification. If a modified record's state was not set, it will be set to Unmatched.

Default value: 'Yes'.

On survivorship

If 'Yes': The add-on merges records automatically when the match score is above the threshold set in the 'Stewardship max score' property. Note that any existing Merged records will be taken into account when performing the merge operation.

If 'No': Survivorship is disabled.

Default value: 'Yes'

Always apply survivorship

If 'Yes': The add-on merges records automatically when the match score is above the threshold set in the 'Stewardship min score' property. It means that every cluster created will be merged automatically.

If 'No': The add-on merges records automatically when the match score is above the threshold set in the 'Stewardship max score' property.

Default value: 'No'

Forced golden creation

If 'Yes': Record creation is set to a golden state in the '001' cluster without any matching execution. This property has no impact on procedures applied on modification and deletion of records. This property makes the 'Is import mode' property inactive.

When the 'Forced golden creation' property is set to 'Yes' then the workflow is deactivated.

If 'No': Forced creation of golden records is deactivated.

Default value: 'No'

Is import mode

If 'Yes': The policy is used in the context of bulk data import. On creation and/or modification record states are set to 'to be matched' (refer to the next property 'Is import mode insert only').

If 'No': The policy is used in the context of interactive creation and modification of records.

When the 'Is import mode' property is set to 'Yes' then the workflow is deactivated .

Default value: 'No'

Is import mode insert only

This property is used only if 'Is import mode'='Yes'

If 'Yes': Only new records move to 'to be matched'.

If 'No': Updated and new records move to 'to be matched'

Default value: 'No'

Merged record is recycled

If 'Yes': Modification of merged records is enabled.

When a merged record is modified behavior depends on property settings as follows:

  • If Modify merged without match is activated: If the Merged record and its target (Pivot or Golden) are in the same cluster, the merged record remains in the current cluster. Survivorship is applied to merge data from all Merged records in the cluster to the target record—'Is forced to void' is ignored. Otherwise, the Merged record remains untouched.

  • Otherwise, the merged record becomes 'Unmatched'. If its target is null or 'Deleted', 'Unmatched' or 'To be matched', then a match table operation will be executed on the unmatched record. Otherwise, a match table operation will be executed on its target. Eventually, if the unmatched record remains unaffected by previous steps, a match table operation will be executed on it.

If 'No': Modification of merged records is disabled.

This option is disabled when 'Is import mode' is set to 'Yes'.

Default value: 'No'.

Stewardship min score (%)

When a match score is lower than this threshold then the add-on does not consider the record as a suspect record.

Stewardship max score (%)

When a match score is higher than this threshold, the add-on enforces an automatic merge of this record into the corresponding pivot or golden record. This procedure is also known as 'survivorship'.

It is important to select this value carefully. If it is set too low, records may be automatically merged with pivot and golden records that are not actually related.

When scores fall between the 'Stewardship min score' and the 'Stewardship max score' values, human interaction is needed to decide how to manage the suspect record (stewardship process).

Threshold second matching(%)

When the score computed by the first algorithm is between 0% and the value specified by the 'Threshold second matching(%)' property, then the second algorithm is applied to detect potential false negative records.

For example, if 'Threshold second matching' is set to '5%' then the second algorithm is executed each time the score of the first algorithm is lower than '5%'.

Constraint: 'Threshold second matching' is lower than or equal to 'Stewardship min score'.

This property is used when 'Second matching algorithm' in the matching policy is not empty.

Default value: '0'

Second level stewardship min score (%)

When the second matching level is configured to find false negative records from a first level matching, then this threshold is used to evaluate the matching score and whether or not it provides a suspect record.

This property is used when 'Second matching algorithm' in the matching policy is not empty.

Not suspect with threshold (%)

When a record 'A' is 'Not suspect with' a record 'B', the score of 'B' against 'A' is kept by the add-on. When a new matching is executed, if the variance (new score - former score > 'Not suspect with threshold') then 'B' is no longer considered as 'Not suspect' and becomes a suspect of 'A'. For instance, 'Not suspect with threshold'=10% and score of 'B'=60% when it was stated 'Not suspect with'. If the new score of 'B' against 'A' is higher than or equal to 70% then it comes back as suspect otherwise it remains 'Not suspect with' the record 'A'.

On not suspect with

Allow to deactivate the 'not suspect with' feature.

Case 1: 'On not suspect with' = 'Yes' and new score - old score < 'Not suspect with threshold' (note: new score is the score of current matching, old score is the score of suspect record in the list of not suspect of pivot record):

  • The record is not suspect, and list of not suspect with in Pivot does not change.

Case 2: 'On not suspect with' = 'Yes' and new score - old score >= 'Not suspect with threshold':

  • The record is suspect, and it is removed from the list of not suspect with in the Pivot.

Case 3: 'On not suspect with' = 'No' and new score - old score < 'Not suspect with threshold':

  • Record is Suspect and it is removed from the list of not suspect with in the Pivot.

Case 4: 'On not suspect with' = 'No' and new score - old score >= 'Not suspect with threshold':

  • Record is Suspect and it is removed from the list of not suspect with in the Pivot.

If 'Yes': When a match is executed the 'not suspect with' records participate in the matching only if the 'not suspect with threshold' is reached.

If 'No': When a match is executed the 'not suspect with' records participate in the matching as regular records.

Default value: 'No'

Bidirectional not suspect with

If you enable this property and run the 'Not suspect with' service on a Suspect record, the add-on saves any 'Not suspect with' records in the Pivot and Suspect record's metadata tabs.

On suspect record retention

If you set this property to 'Yes', when suspect records no longer match with their Pivot records, the score of the impacted cluster will be recalculated. A new pivot is selected based on the 'Pivot selection mode' property setting. Survivorship is not applied; only the score is recalculated.

This strategy can be changed by setting the 'On suspect record retention' property to 'No'. With this configuration, the suspect records that no longer match with the pivot record will be moved to the unmatched state.

Default value: 'Yes'

Cluster retention on matching suspect

You can use this property when running a 'Match at once' in full mode, 'Match table' or 'Modify' operation on Suspect records. For the following examples, consider Suspect record A as the Pivot. The add-on matches A against all other records in the table. However, when running a 'Match at once in full mode' or 'Match table' operation, the match excludes records in A's cluster.

  • 'Match at once in full mode' — For this example, assume that record B scores the highest against A. If B's new score is greater than A's current score:

    • When 'On cluster retention for matching suspect' is set to 'Yes', the add-on moves A into B's current cluster. However, if B is located in a predefined cluster, the add-on moves A and B into a new cluster.

    • When 'On cluster retention for matching suspect' is set to 'No', the add-on moves A and B into a new cluster.

  • 'Match table' — Assume that records B, C, and D score higher than A and that B has the highest score of all.

    • When 'On cluster retention for matching suspect' is set to 'Yes', the add-on moves A, C and D into B's current cluster. However, if B is located in a predefined cluster, the add-on moves A, B, C, and D into a new cluster.

    • When 'On cluster retention for matching suspect' is set to 'No', the add-on moves A, B, C and D into a new cluster.

  • 'Modify' on a Suspect record — This entails the same behavior as a 'Match table' operation on a Suspect record. But, in this case the add-on matches A against all other records in the table including those in A's cluster.

Default value: 'No'

On process driven

If 'Yes': The configurations defined with 'On simple matching', 'On workflow by state' and 'On workflow for survivorship' are used.

If 'No': Direct matching without workflow is executed.

Default value: 'No'

On simple matching

'On simple matching'

  • This property activates/deactivates the 'On simple matching' property. Simple matching creates suspicious records when records are identified as potentially suspect.

  • If 'Yes': Simple matching is active (creation of suspicious record) and the properties contained in the 'On simple matching' group are used at execution time.

  • If 'No': Simple matching is not active and the properties contained in the 'On simple matching' group are not used.

  • Default value: 'No'

'Under workflow'

  • 'Workflow on creation'

    • The name of a workflow the add-on automatically launches in case a record is deemed as suspicious at creation time. 'None' if no workflow is to be used.

    • Default value: 'None'

  • 'Workflow on modification'

    • The name of a workflow the add-on will launch automatically in case a record is deemed as suspicious at modification time. 'None' if no workflow.

  • Default value: 'None'

'Under submit'

  • 'Embedded at submit'

    • If 'Yes': In case of a suspicious record, the list of duplicate records is automatically displayed at creation and modification time (the simple matching view is used).

    • If 'No': No automatic matching UI embedded at the submit on creation and modification.

    • Default value: 'No'

  • Embedded functions

    • If 'Yes': the function is available

    • If 'No': the function is not available

    • Default value: 'Yes'

      • Make definitive golden

      • Make golden

      • Options on Make golden

        • Make golden without merging

        • Make golden and force all records to merged

        • Make golden and force selected records to merged

      • Delete

      • Run stewardship on suspicious

      • Run stewardship on pivot

      • Run stewardship on best record

      • Best record selection mode

      • Merge and set golden

      • Merge and set definitive golden

      • Hide suspicious from list of duplicates

If a workflow is configured for creation (or modification) then the embedded property at creation time (or modification) is no longer possible.

When no workflow and no embedded properties are configured then the add-on does not provide any information to the user when a suspicious record is raised. This is the responsibility of the application to manage (or not) a process to inform the user.

The embedded mode is possible only through the regular EBX® view, not from the light and full matching views since they already provide a full inline integration.

On workflow by state

To make this property active: 'On process driven'='Yes' and 'On simple matching'='No'.

This property allows you to declare a workflow that the add-on will launch automatically depending on the matching result.

For example, if [suspect, myWorkflowToManageSuspect] is configured, then this workflow will be created after the creation or modification of any record moving to the 'Suspect' state.

An undefined value is used to deactivate this configuration.

Default value: 'undefined'

1. State: This property allows you to declare the state that records become after a matching operation.

2. Workflow name: Name of the workflow launched when a record moves to the state defined by the 'State' property after a matching operation.

When the add-on launches a workflow due to record creation or modification, the following variables must be declared in the data context.

- branch: Reference of the data space.

- instance: Reference of the data set.

- tablePath: Path of the table.

- xpath: XPath of the record.

- workflowIdentifier: Unique identifier to track the record.

On workflow for survivorship

To make this property active: 'on process driven'='Yes' and 'On simple matching'='No'.

Name of the workflow launched by the add-on in case of survivorship (at least one automatic merge has been executed).

An undefined value is used to deactivate this configuration.

Default value: 'undefined'

When the add-on launches a workflow due to record creation or modification, the following variables must be declared in the data context.

- branch: Reference of the data space.

- instance: Reference of the data set.

- tablePath: Path of the table.

- xpath: XPath of the record.

- workflowIdentifier: Unique identifier to track the record.

Options for Parallel Match at once:

In the 'Match at once' screens, you can configure properties to divide records into groups and execute matching on these groups simultaneously. After matching completes, results are merged into the original data.

If the number of groups is bigger than the number of threads, groups are evenly distributed in the different dataspaces/threads.

This technical setting is equivalent to configuring the Filter by property in the Matching policy. It has no impact on the returned result, but improves the performance(memory and speed).

Filter by

The add-on uses the fields in 'Filter by' to group records containing the same value into a thread. When you add a filter occurrence, you select one or more fields within the table:

  • When you choose one field — the add-on groups all records that contain the same value in that field. For example, if you selected Country, all records with the same Country value are put in the same thread; such as, USA in one thread, France in another and Vietnam in another.

  • When you add more than one field — the values in all specified fields must match to be included in a thread. For example, if you specified Country and Gender, the add-on puts all records with France and Female in one thread and those with France and Male into another thread.

Number of threads

Specifies the number of threads executed in parallel. For each thread, the add-on creates a dedicated temporary dataspace inside the original dataspace on which the Match at once operations are executed.

Default value: 4

Commit threshold

The add-on commits results of Match at once operations in temporary dataspaces to the original dataspace when the number of executed records reaches this threshold. This threshold must be tuned according to your database.

Default value: 100000

Table 6: Process policy properties

Examples - Process policy configurations

In these examples a table called 'Articles' is considered. You can configure your own table.

Direct matching without workflow

/direct_matching_without_using_workflow.png

Simple matching

Set the 'On process driven' property and 'On simple matching' to 'Yes' to enable it.

/simple_matching_configuration.png

Bulk data import

/bulk_data_import.png

Bulk data import for new records only

/bulk_data_import_new_records.png

Direct creation of golden records (interactive and import)

/golden_record_direct_creation.png

Deactivation of the matching

/deactivation_of_matching.png

Deactivation of the matching when modification of records

/deactivate_matching_on_record_modification.png

Initialization from an external source. All records are set to unmatched

/initialization_to_unmatched.png

Initialization from an external source. All records are set to golden

/initialization_to_golden.png

Initialization from an external source. All records are set 'to be matched'

/initialization_to_to_be_matched.png

User interface

In this section, you will find information on how to configure the user interface and detailed descriptions of all properties under the User interface tab. This tab allows you to regroup all user interface settings for a given table. They are organized by service available to the end user.

/User_Interface.png

Properties

Definition

Full data quality stewardship

Default view - top part

The add-on provides the matching view for the top part by default. You can customize it to fit your needs. To configure a custom view:

  • Create a custom view in EBX®.

  • Publish the custom view.

  • Navigate to the User interface table, fill in the Default view - top part property under the Full data quality stewardship group with the published name of your custom view. Your custom view must contain the State column displayed in order to offer all features necessary for the add-on. If it doesn’t, the default table view will be applied. It is also a good practice to ensure the score and cluster columns display.

Default view - bottom part

The add-on provides the matching view for the bottom part by default. You can customize it to fit your needs. To configure a custom view:

  • Create a custom view in EBX®.

  • Publish the custom view.

  • Navigate to the User interface table, fill in Default view - bottom part property under the Full data quality stewardship group with the published name of your custom view. Your custom view must contain the State column displayed in order to offer all features necessary for the add-on. If it doesn’t, the default table view will be applied. It is also a good practice to ensure the score and cluster columns display.

View by user profile

You can bind views to specific user profiles. This enables a user, or set of users to access specific matching views. To configure custom views for specific user profiles:

  • Create a custom view in EBX®.

  • Publish the custom view.

  • Navigate to the User interface table, fill in View - top part and View - bottom part properties with user profiles and the published name of your custom view. Your custom view must contain the State column displayed in order to offer all features necessary for the add-on. If it doesn’t, the default table view will be applied. It is also a good practice to ensure the score and cluster columns display.

State(s) to filter from selection

You can hide any unwanted states from the State button on the top right corner of the matching view.

Light data quality stewardship

Default view

The add-on provides the matching view by default. You can customize it to fit your needs. To configure a custom view:

  • Create a custom view in EBX®.

  • Publish the custom view.

  • Navigate to the User interface table, fill in Default view property under the Light data quality stewardship group with the published name of your custom view. Your custom view must contain the State column displayed in order to offer all features necessary for the add-on. If it doesn’t, the default table view will be applied. It is also a good practice to ensure the score and cluster columns display.

View by user profile

You can bind views to specific user profiles. This enables a user, or set of users to access specific matching views. To configure custom views for specific user profiles:
  • Create a custom view in EBX®.

  • Publish the custom view.

  • Navigate to the User interface table, fill in View property under the Light data quality stewardship group with the published name of your custom view. Your custom view must contain the State column displayed in order to offer all features necessary for the add-on. If it doesn’t, the default table view will be applied. It is also a good practice to ensure the score and cluster columns display.

State(s) to filter from selection

You can hide any unwanted states from the State button on the top right corner of the matching view.

Simple matching view

Default view

The add-on provides the matching view by default. It can be changed for every table depending on the need. To configure a custom view:

  • Create a custom view in EBX®.

  • Publish the custom view.

  • Navigate to the User interface table, fill in the Default view property under the Simple matching group with the published name of your custom view. Your custom view must contain the columns State displayed in order to offer all features necessary for the add-on. If it doesn’t, the default table view will be applied. It is also a good practice to ensure the Simple matching score(%) column displays.

View by user profile

You can bind views to specific user profiles. This enables a user, or set of users to access specific matching views. To configure custom views for specific user profiles:

  • Create a custom view in EBX®.

  • Publish the custom view.

  • Navigate to the User interface table, fill in the View property with the desired user profiles and the published name of your custom view. Your custom view must contain the columns State displayed in order to offer all features necessary for the add-on. If it doesn’t, the default table view will be applied. It is also a good practice to ensure the Simple matching score(%) column displays.

Available actions

Determines availability of options in the simple matching view.

Merge view

Default view

The add-on provides the merge view by default. It can be changed for every table depending on the need. To configure a custom view:

  • Create a custom view in EBX®.

  • Publish the custom view.

  • Navigate to the User interface table, fill in the Default view property under the Merge view with the published name of your custom view. Your custom view must display the primary key columns in order to offer all features necessary for the add-on. If it doesn’t, the default table view will be applied. It is also a good practice to ensure the source column displays (if applicable).

View by user profile

You can bind views to specific user profiles. This enables a user, or set of users to access specific merge views. To configure custom views for specific user profiles:

  • Create a custom view in EBX®.

  • Publish the custom view.

  • Navigate to the User interface table, fill in the View property with user profiles and the published name of your custom view. The User profile property lists all profile and user available and View is a string parameter to fill a view ID. The mechanism is the same as the one used in the process policy settings for views. Your custom view must display the primary key columns in order to offer all features necessary for the add-on. If it doesn’t, the default table view will be applied. It is also recommended that you display the source columns (if applicable).

Apply EBX permission

Determines whether EBX® permissions apply. If a user does not have permission to view a field, it does not display. However, it is good to keep in mind that users in charge of merging must have access to all record fields.

Allow the pivot to be modified

Determines whether users can change the Pivot record in the Merge view by selecting its primary key.

Merge option

Defines the options available to merge records manually. The options are described below:

  • Set Golden will set the consolidated record as a golden record. It will participate in the future matching operations.

  • Set definitive golden will set the consolidated record as a golden record. It won't participate in the future matching operations.

  • Both The user will have the choice between both options at the end of the merge process.

Default value: Set Golden

Table 7: User interface

Matching policy

A matching policy is applied to a table and a matching context (optional) in order to declare how the add-on executes matching. For example, a matching policy can define which fields are used to compute the matching score and which algorithm is used.

To configure a matching policy:

Below is an example with a table named 'Party', structured as follows:

Field

Definition

Code

Any naming convention without white spaces

Organization type

{Company, Individual}

Nature

If Organization type = Company: {Domestic, Overseas}

If Organization type = Individual: {Prospect, Customer, Employee}

Name

Address

Country

For Overseas company only

Social number

For Employee only

Sales representative

For Customer only

Table 8: Example: 'Party' table

For this table, depending on the values of Organization type and Nature, specific fields are used to drive the matching process. The matching policy contexts needed are as follows:

Matching policy context

Matching policy

(fields used for matching)

Organization type

Nature

Company

Domestic

Name, Address

Company

Overseas

Name, Country

Individual

Employee

Name, Social number

Individual

Customer

Name, Sales representative

Individual

Prospect

Name, Address

Null

Null

Name

Table 9: Example of matching policy contexts for the table 'Party'

In this example, six different matching policy contexts are defined and associated with five different matching policies: (the policy '(Name, Address)' is used twice).

The add-on relies on three tables to manage the configuration of matching policies:

The relationships between these tables are as follows:

You can also define contexts per dataspace and dataset to execute matching policies depending on the data life cycle. This configuration is done directly in the 'Matching policy' table.

Matching policy configuration

In this section, you will find information on how to configure Matching policy and detail description of all properties under Matching policy group.

Note

You can test a new or existing matching policy configuration to ensure it returns the desired results. See Testing a matching policy for more information.

Matching policy context

Matching policy context table (logical name: MatchingPolicyContext).

Properties

Definition

Code

Any naming convention without white spaces can be used.

Table

A link to a table under the add-on's control.

Name

Name of the matching policy context.

Field contexts

Field context: a field in the table used as the context.

Use foreign key value: when enabled, the foreign key's value is used instead of its label. This option only displays when the chosen 'Field context' is a foreign key field.

Value: value of the field. The value is case sensitive.

  • For an empty value use the following constant: osd:is-empty

  • For a non-empty value use the following constant: osd:is-not-empty. In this case, if the field value is not null or not empty, the matching policy context is applied.

A multi-valued field cannot be selected as a field context. To use such a field, you need to create a new field with a function that aggregates the values into the excepted format. Then, this field can be used as a field context.

When a foreign key field is defined, the 'Default label' is used to compare by default. If there is no 'Default label' set, 'Programmatic label' is used. In case neither 'Default label' nor 'Programmatic label' set, the field value is considered empty.

Context Java class

A Java class you define to extend the MatchingPolicyContextChecker.java API.

  • When Field contexts are defined and Context Java class is not defined: The add-on uses context based on Field contexts

  • When Field contexts are not defined and Context Java class is defined: The add-on uses context based on Java class

  • If both of them are defined: The add-on uses context based on Java class. But inside Java class, users can get the list of Field contexts defined.

Table 10: Matching policy context properties

The use of matching policy contexts is optional. A matching policy can be configured without a context (see field 'Is context') in the Matching policy table below.

Any number of fields can be defined as part of a context. Fields are combined using the 'and' operator. Defining a context with only one field is possible. A 'field context' can be a value function that dynamically computes a record's context based on business rules. For instance, depending on a list of countries, the 'Language' context field can be computed automatically.

Matching policy

Special Notation:

/100002010000000E0000000E3C009B93.png

The configuration for a table must declare one active matching policy with no context. When several matching policies are active for different contexts, then the policy with no context is used by default for every execution not related to a context.

Matching policy table (logical name: MatchingPolicy)

Properties

Definition

Code

Any naming convention without white spaces can be used.

Short description

A description of the policy.

Long description

Long description of the policy.

Table

Table for which this policy is defined.

Active

If 'Yes': This matching policy is used.

If 'No': This matching policy is not used.

This field indicates whether or not the policy will be used when running matches. Since the add-on manages the matching policy context, it means that one or more matching policies can be active at the same time for a given table. To disable a matching policy, set the active field to 'No'. Only one matching policy can be active for each matching context. One active matching policy with no context must be configured. It is used by default matching at execution time when a matching execution has no relation to a defined context.

Is forced to void

If 'true': The matching policy short-circuits upon execution. This allows you to have an active matching policy for which matching does not execute.

If 'No': The matching policy executes per its configuration. When set to 'No', the property 'Is forced to void' has no effect on normal matching execution.

Default value: 'false'

Funneling matching

If 'Yes': The funneling matching mode is activated.

If 'No': The funneling matching mode is deactivated.

Default value: 'No'

  • By default the add-on computes the matching score by applying a weighted average on the field scores that participate in the matching policy. When a score applied to a field is lower than the 'Stewardship min score' (process policy) then it still participates in the average computation and its weighting is still used.

  • When the 'Funneling matching' option is used, each field is associated to a 'Field stewardship min score' (refer to Matching field table). If one of the fields does not reach this threshold then the matching is canceled. When each field is scored with a sufficient value, then a weighted average score is computed.

For match at once

The matching policy used for 'Match at once' operations. A table can have only one matching policy active at a time for this type of operation. If no 'For match at once' policy is specified, then the default policy is used by 'Match at once' operations.

Modify Merged without match

When a Merged record is modified with 'Merged record is recycled' activated, you can decide whether to re-run the 'Match table' operation. If the target record (Pivot or Golden) is auto-created: Survivorship applies to merge data from all Merged records which have the same target record and cluster as the modified Merged record. The 'No match when modifying Merged record' option is ignored by the add-on.

If 'Yes': When the Merged record and its target (Pivot or Golden) are in the same cluster, the merged retains in the current cluster and the Survivorship is applied to merge data from all Merged records in the cluster to the target record, 'Is forced to void' is ignored. Otherwise, the Merged record is untouched.

If 'No': The add-on keeps the current behavior when 'Merged record is recycled' is activated.

Default value: 'No'

For group at once

The Matching policy used for 'Group at once' operations. A table can have only one matching policy active at a time for this type of operation. If no 'For group at once' policy is specified, the default policy is used by 'group at once' operations.

For search before create

The matching policy used for the 'Search before create' feature. A table can only have one active matching policy with 'Search before create' enabled at a time. If the 'For search before create' policy is not specified, the default policy will be used.

Keep not matched records untouched

Determines whether records keep their current state after a matching operation produces no matches. This setting applies to the following 'Match at once' services: Match at once, Match at once full mode, Parallel match at once and Exact match at once.

If set to 'Yes': The Unmatched, To be matched, and Suspicious records remain in their current state when no matches are found.

If set to 'No' or not defined: Records that do not have any matches become Golden in cluster 01.

Default value: 'No'

Automatically create new golden

If 'Yes': The add-on automatically creates a new golden record when it identifies a positive match between records. This property is only applied in the case of record creation, modification or a 'Match table' operation. It is used only if the survivorship property is 'On' and if survivorship rules are defined. The process used by the add-on for golden creation is as follows:

  • From the matched records, the add-on selects a pivot record based on the survivorship policy's Survivorship selection mode value.

  • The pivot record moves into the suspect state and the merge process executes.

  • The golden record is obtained by merging all suspect records.

If 'No': A new golden record is not automatically created when there is a match.

Default value: 'No'

Note that if there is already an 'Auto-created' record in the cluster, that record will be chosen as the new golden.

Auto create new golden in match at once

If set to 'Yes': The 'Automatically create new golden' service applies when running a 'Match at once' service including 'Exact match at once' and 'Exact match at once in memory'. It is used only when you set the survivorship property to 'On' and define survivorship rules.

If set to 'No': The 'Automatic create new golden' is not applied when running a 'Match at once' service.

Default value: 'No'

Note that if there is already an 'Auto-created' record in the cluster, that record will be chosen as the new golden.

Auto create new golden for single golden

If set to 'Yes': When a match results in the creation of a single Golden, the add-on duplicates the record and moves both records into a new cluster. The auto-created record becomes Golden, and the former single Golden becomes Merged targeted to the new auto-created record. The newly created Golden record has the same behavior as other records as it is modified or under a matching operation.

This property is applied:

  • On record creation and modification.

  • On 'Match table' and 'Match at once' operations.

  • When running 'Set golden' on Suspicious, Unmatched, Suspect and Pivot records.

  • When running 'Remove and set golden' on Suspect records.

  • When running 'Set back golden' on Suspect and Pivot records.

If set to 'No' or not defined: The 'Automatic create new golden' is not applied.

Default value: 'No'

Customize source value for new golden

If 'Yes': When the 'Automatically create new golden' property is activated, the value specified by the 'Source value' property updates the source field's value in the newly created golden record. The 'Source value' property is configured in the matching policy.

If 'No': The source field value in the newly created golden record isn't updated.

Default value: 'No'.

Source value

A source identifies a system that provides data either at the record or field level. The source is selected from the 'Source' table.

Is context

If 'Yes': This policy is executed for a matching context defined by the link to the 'Data life-cycle context', the 'Matching policy context' table and/or the 'Workflow context'. When several matching policies are candidates for a composite context based on two or three values among 'Data life-cycle context', 'Matching policy context', and 'Workflow context' then a priority is applied as follows, based on a weight average: 'Data life-cycle context' = '2', 'Matching policy context' = '1', 'Workflow context'='4'. The selection procedure is as follows:

  • Look up all active matching policies for the table with 'Is context'='Yes'.

  • Filtering by context workflow value if not undefined.

  • Filtering the previous list by dataspace / dataset if not undefined.

  • If it rests more than one matching policy at this step, meaning that there is at least one with a matching context.

  • Then try to use a matching context that fits with the record values.

  • If OK then the matching policy is found.

  • If OK either it still remains a matching policy with matching context is undefined and then the add-on uses it.

  • Or this is impossible to find a matching policy and then the default one must be used ('is context='No' and 'Active'='Yes').

If 'No': This matching policy is not related to any matching context.

Default value: 'No'

Data life cycle context

A matching policy can be defined for a specific dataspace and/or dataset.

This property is a link to a data life cycle context (dataspace and/or dataset).

An undefined value deactivates this context.

Matching policy context

Matching policy context for which this matching policy is defined. An undefined value deactivates this context.

Workflow context

When executing a workflow you can define a property (cf. Workflow User tasks) that specifies execution of a specific matching policy. Then, the 'Workflow context' property is set to the same value as the one used in the user task.

An undefined value here deactivates this context.

Pivot record selection mode

This rule is applied by the add-on (Match table and Match at once operations) to decide which of the suspect records is the pivot record (see Record selection policy section for further information on the different strategies that the add-on can apply).

This policy is disabled when 'Golden is preserved for selection' or 'Automatically create new golden' is applied.

Default value: 'New update'

Note that auto-created records are given the highest priority.

Golden is preserved for selection

If 'Yes': The Pivot record selection mode will prioritize a record identified as 'Golden' or 'was golden' if it exists in the cluster. If many records with the 'Golden' or 'was golden' identifier exist, the 'Most recently acquired' record is used. If there is no record with 'was golden' marker or no 'Golden' record, then the 'Pivot record selection mode' executes normally. This property is used to select the pivot record in case of record creation and modification only. It is not used with the 'Match table' and 'Match at once' operations.

If 'No': The 'Pivot record selection mode' does not use the 'was golden' indicator or 'Golden' state.

Default value: 'No'

A record is identified as 'was golden' when its state was golden but has been changed because of a new matching.

Main matching algorithm

The default algorithm for matching all fields that use this matching policy. If required, the matching policy algorithm can be changed at the field-level (see table Matching field). You can refer to the Matching strategies section for information that will help you decide which algorithm to use.

The selection of the main algorithm is mandatory.

Second matching algorithm

The algorithm used to manage records that could be false negative results after main algorithm execution. It is used after the first matching procedure to ensure that no suspect records have been missed. It is not possible to override this matching algorithm at the field-level. You can refer to the Matching strategies section for information that will help you decide which algorithm to use.

Ignore case

If set to 'Yes': All algorithms and the following 'Exact matching' operations ignore case sensitivity:

- Filter by

- Business ID

- Exact match at once

- Filtering field rule

- Synonym

If set to 'No': All algorithms and 'Exact matching' operations will be case sensitive.

Default: 'No'

Exclude records from matching

If you want matching to ignore records when a field contains specific values:

  • Specify the field you want to monitor using the 'Field to exclude records from match' property. You can do this at the 'Table' configuration level.

  • Use this property ('Exclude records from matching') to define which field values identify records the matching operation should ignore.

When one of these values is equal to the field value of record then the record is ignored by the matching operation.

When a record is created and has excluded value, then its state moves to 'To be matched'. When a record is modified and has excluded value, then its state is not modified.

To declare an empty value, use the following constant: osd:is-empty

Filtering record rule

A business rule can be applied to filter records that must be excluded when matching against the pivot record. This feature is used when the matching policy's 'Exclude record from match table' property is not sufficient (based on a direct equal value of string). The applied rule is configured in the 'Filtering record rule' table. The bespoke parameters group of fields passes parameter values to a rule when needed. Then, the rule needs to be able to manage these parameters.

When a record is created and matches with the filtering rules, then its state moves to 'To be matched'. When a record is modified and matches with the filtering rules, then its state is not modified.

Filter by

When matching a large volume of records, perhaps numbering in the millions, you can improve fuzzy match response time by using a fast filter that returns a subset of records to match. The filter used a 'Exact' query. The fuzzy match is then applied on this subset of records only. You can combine multiple fields in the filter, the 'And' operator is then applied. To guarantee the best performance, you must declare the filter field(s) as index (see EBX® documentation). The 'Match at once' operation uses the 'Filter by' property. However, the 'Exact match at once' operation does not because it acts as a filter on its own.

Handle null value for filter by

  • [Not defined]: If source and target values are null, comparison results in a value of true.

    Comparison results in a value of false if either source, or target are not null.

  • [If source and target values are null, comparison results in a value of false.]

    Comparison still results in a value of false if either the source, or target are not null.

  • [If source's value is not null and target's value is null, comparison results in a value of true.]

    Comparison results in a value of false if the source is null and the target is not null.

    Comparison results in a value of true if the source is not null and the target is null.

Default value: [Not defined]

Literal score

'Exact' value is available. When using the 'Exact' matching policy, this property means that two records are considered as potential duplicates when at least one business identifier field value is equal. This business identifier is defined in the 'Table' (See Matching policy with exact score example in the Matching policy section.) configuration.

No match records when same source

If 'Yes': Two records from the same source will never match. You can use this option if the 'Source field' is configured at the 'Table' level. The add-on checks to ensure no records from the same source exist in the target cluster prior to moving a record. If there is any record from the same source detected, the record remains in its state in the current cluster.

Source field value of suspect records will never be merged into the pivot record.

If 'No': The matching is achieved in a normal way when the records come from the same source that is based on the 'Source field' configuration at the 'Table' configuration level.

Default value: 'No'

Narrow search

Activate this option to reduce the memory usage for matching. When this option is activated for matching policies that have low stewardship min scores, only records that are closely similar are included during matching.

Default value: 'No'

Threshold matching

These threshold matching values are not mandatory. They are used instead of the corresponding values that are defined at the process policy level. If one field is provided then the add-on will use all of them rather than the values specified at the process policy level. In this case, the four values become mandatory.

Stewardship min score (%):

If undefined, the value configured at the Process policy level is used.

When a match score is lower than this threshold then the add-on does not consider the record to be a suspect record.

Stewardship max score (%):

If undefined, the value configured at the Process policy level is used.

When a match score is higher this threshold, the add-on enforces an automatic merge of this record into the pivot or golden record. This procedure is also known as 'survivorship'. Above this value, the record is merged automatically.

It is important to select this value carefully, as if it too low, records may be automatically merged with pivot and golden records that are not actually related.

When scores fall between the 'Stewardship min score' and the 'Stewardship max score' values, a user decision is needed to determine how to manage the suspect record (stewardship process).

Threshold second matching(%)

If undefined, the value configured at the Process policy level is used.

When the score computed by the first algorithm falls between 0% and the specified 'Threshold second matching(%)' then the second algorithm is applied to detect potential false negative records.

For example, if 'Threshold second matching' is set to '5%' then the second algorithm is executed each time the score of the first algorithm is lower than '5%'.

Constraint: 'Threshold second matching' is lower than or equal to 'Stewardship min score'.

This property is used when the 'Second matching algorithm' property in the matching policy is not empty.

Default value: '0'

Second level stewardship min score (%)

If undefined, the value configured at the Process policy level is used.

When the second level of matching is configured to seek false negative records from the first matching, then this threshold is used to evaluate if the matching score provides a suspect or not.

This property is used when the 'Second matching algorithm' property in the matching policy is not empty.

Matching through a relation

Use matching through a relation(s)

'Relation matching' allows the add-on to match data through existing table or join table relationships.

If 'Yes': For the matching operation, the add-on combines the table on which this policy is configured with any table related to it either by join or by relationship.

If 'No': The relational matching configuration is ignored.

Default value: 'No'

Relation record score weight

This property allows you to configure the matching score weight that is computed for the 'Relation table'. The matching score obtained for the 'Relation table' is combined with the score computed on the 'Table to match' through a weighted average.

Default value: '1'

Relation record stewardship min score

This is used when the 'Funneling mode' property in the matching policy is active.

If the score of the field is lower than the 'Field stewardship min score' then there is no match.

By default the value is set to 100 to make the threshold inactive.

Default value: '100'

Use join table

If 'Yes': The relation match is configured through a join table between the 'Table to match' and the 'Relation table'.

If 'No': The relation match is configured with a direct relation between the 'Relation table' and the 'Table to match'.

Default value: 'No'

Relation match with join table

Join table

  • Selection of the join table that links the 'table to match' with the 'relation table'.

Link: Join table → Table to match

  • Selection of the foreign key to use for the relation between the join table and the 'table to match'.

Relation table

  • Selection of the 'relation table'.

Link: Join table → Relation table

  • Selection of the foreign key to use for the relation between the join table and the 'relation table'.

Matching policy on Relation table

  • Selection of the matching policy to apply on the 'relation table'.

Relation match without join table

Relation table

  • Selection of the 'relation table'

Link: Relation table → Table to match

  • Selection of the foreign key to use for the relation between the 'relation table' and the 'table to match'

Matching policy on Relation table

  • Selection of the matching to apply on the 'relation table'

Table 11: Matching policy properties

Matching field

Configuration of the fields used in matching execution time.

Matching field table (logical name: MatchingFieldRule)

Properties

Definition

Code

Any naming convention without white spaces can be used.

Matching policy

Reference to the 'Matching policy' table.

Field

A field in the table that is analyzed by the referenced matching policy. This field is used during the matching procedure. If the field value is empty at execution time, then matching is not enforced.

It is possible to match inside a list. Two lists of values will match when at least one value matches the two lists. For instance (John, Carl, Paul) will match with (Frank, Theo, John).

Foreign key

A list of foreign key fields. You define a path to another field—an alternative to the one defined by the 'Field' property—to execute matching on by creating hops to navigate the relationships. The add-on uses the value from the last hop during the matching procedure. The foreign key 'Default label' is used to compare by default. If there is no 'Default label' set, 'Programmatic label' is used. In case neither 'Default label' nor 'Programmatic label' set, the field value is considered empty.

Handle null value matching

Provides null value matching strategies using the following options:

  • [Not defined]: The score is calculated base on the algorithm.

  • If both values are null, score is 100%, 0% otherwise.

  • If both values are null, score is 0%, ignore otherwise.

  • If one of the values or both are null, ignore.

  • If one of the values or both are null, returns 0%,

In case the returned matching result is 0%, the Surrogate field (if exists) is used to match instead of the configured matching field.

Note that an undefined foreign key matching field will be considered as null, and this property does not support multi-hop foreign key.

Matching algorithm

By default, the algorithm used is the one defined as the main algorithm in the table's matching policy. This default value can be changed to select another algorithm.

Score weight

When several fields of a source table are used for matching, the score weight is used to give a weight to each field score that the add-on uses to compute a weighted average.

Example: '0.5' means the score of the field is worth half (50%). If a matching field is empty then the weighted average does not take into account this field.

Default value: '1'

Score calculation

Defines how the score is calculated:

- Real score: The exact score returned by the algorithm.

- Normalize score: The score will be the maximum score of the maximum stewardship defined minus 0.1. This strategy can be used to avoid auto merge of records when they are not strictly identical.

- Default value: Real score

Field stewardship min score (%)

Used when the 'Funneling mode' property in the matching policy is active or in a surrogate field matching.

- When the 'Funneling mode' property is active: If the score of the field is lower than the 'Field stewardship min score' then there is no match. By default the value is set to 100 to make the threshold inactive.

- If the field score is lower than 'Field stewardship min score', surrogate field (if exists) is used to match instead of the configured matching field.

Filtering field rule

A business rule can be applied to filter the field value in order to ignore characters such as N/A, Inc., *, $, etc. before the match execution. A rule can perform any type of filter. One example is removing all figures in a string. The filter is applied to the pivot record and the records involved in the match. It is performed in memory and does not change the actual value of the data in the repository. Only string fields can be filtered. Foreign key values cannot be filtered. The bespoke parameters group of fields is used to pass parameter values to a rule when needed. Then, the rule must be able to manage these parameters. The filtering field rule must depend on the record's fields.

Use synonym group

A group of synonyms can be selected. If a field value in the suspect record does not match with the pivot, then a new matching is achieved with the synonym values rather than the initial value in the suspect field. The first positive match is used to get the final score.

When values are the same with any case difference, the score will be max score - 0.1.

Check synonym in all groups

If 'true': When matching uses the synonym mechanism, it searches through all child synonym groups.

If 'false': When matching uses the synonym mechanism, it searches through each separate child synonym group.

Smart synonym matching

When matching uses the synonym mechanism, it checks whether strings from the specified field match with values in the list of synonyms. If a value matches, the add-on replaces it with the shortest value before running match. If it matches 100% then the record score is equal to the maximum score -0.1. When there are several synonyms with the same length, the first value in the list will be taken to replace for other synonyms.

Use smart synonym matching:

If set to 'Yes': The Smart synonym matching is applied.

If set to 'No': The Smart synonym matching is not applied.

Default value: 'No'

Ignore case:

If set to 'Yes': Ignores case when Smart synonym matching is activated.

If set to 'No': Smart synonym matching is case sensitive.

Default value: 'No'

Surrogate fields

Defines the fields and comparison mode applied in a matching when the returned score on the configured matching field is lower than the 'Field stewardship min score'.

Field

List of fields in the matching table with the same data types which can be configured as alternative matching fields.

Comparison mode

  • First match(default): When a surrogate field matches the pivot's field with a score higher than 'Field stewardship min score'. Surrogate field match returns this score.

  • Best match: When all surrogate fields have been compared and at least one matches the pivot's field with a score higher than 'Field stewardship minscore'. Surrogate field match returns the best score among all matched surrogate fields.

Table 12: Matching field properties

Special notation:

/100002010000000E0000000E86672C34.png

Configuration of the weighted average to take into account an empty field.

Matching algorithm

The add-on provides several predefined algorithms.

Matching algorithm table (logical name: MatchingAlgorithm)

This table is located under the EBX® Administration Tab in the 'Matching reference data' dataset.

Properties

Definition

Name

Algorithm name (Eg. DoubleMetaphone, FuzzySearch, Levenshtein, etc.)

Description

Context of use.

Is prebuilt

If 'Yes': The algorithm is provided by the add-on.

If 'No': The algorithm is not provided by the add-on.

Is percentage score

If 'Yes': The algorithm provides a similarity score.

If 'No': The algorithm provides a binary score (Match=1, Unmatch=0). When the score is equal to Match, then the add-on systematically overwrites the score to 'Stewardship min score + '1' '.

Java class

Specifies a Java reference to provide a custom algorithm.

Table 13: Matching algorithm properties

Predefined algorithms

The add-on provides these predefined matching algorithms.

Algorithm

Default parameters

Description and Parameter configuration (if applicable)

Chinese

Allows matching in Chinese.

Double metaphone

Max code length = 4

This phonetic algorithm works best on short strings, such as proper names. It is especially adept at returning words or names whose actual pronunciation may be different than the search text entered.

The Max code length property limits the code length used to find possible matches. When you enter a search string, the algorithm encodes it as a key and returns words with matching keys. You should set this property to a value that reflects the length of text being searched. For example:

If you specify a value of 4, the algorithm encodes the three words "cricket", criket" and cricketgame" as "KRKT". The algorithm considers the three words a match. If you changed the value to 8, "cricket" and "criket" are still encoded as "KRKT". However, it encodes "cricketgame" as "KRKTKM". In this case, "cricketgame" no longer matches.

Note that this algorithm cannot be used to search numeric, date/time, or special character formats. Also, due to the way the algorithm processes phonetic structures, a search for "www" returns no result.

Double metaphone Levenshtein

Max code length = 4

Being a phonetic algorithm, Double Metaphone may fail to match misspelled words when the misspelling substantially alters the phonetic structure of a word.

The Double Metaphone Levenshtein algorithm can compute distance between two long strings, but at the cost to compute it, which is roughly proportional to the product of the two string lengths.

So, a combination of these algorithms reduces their limitations. Levenshtein may find similarity between encoded strings, and the length of encoded strings is limited by Double Metaphone.

Exact

This algorithm returns a matching score of 100% for empty field values. Even though this algorithm runs an exact match on the specified field(s), other fields can use different matching algorithms. This means one matching operation may include multiple algorithms. If you want to improve response time by executing a purely 'Exact' match, refer to the matching policy's 'Filter by' property. Alternatively, you can use the 'Exact match at once' service.

Full text

This algorithm finds a non-case sensitive, exact match of the entered keyword in data of longer strings.

FuzzyFullText

Similarity = 0.7

Prefix length = 0

This algorithm works best for general strings like those contained in descriptions. This algorithm finds a similar, or fuzzy, match of the keyword text entered.

The Similarity parameter determines how similar results have to be before they are returned. The higher you set the value, the fewer results and vice versa.

The Prefix length parameter specifies that a number of characters from the beginning of the keyword must exactly match data being searched in order to return a result. For example, if you set the value to 2 and use the keyword "Automotive", the algorithm only considers words that begin with "au" as potential matches.

FuzzyJapanese

Similarity = 0.7

Prefix length = 0

This algorithm performs a search on Japanese text and finds a similar, or "fuzzy" match. This algorithm allows you to use the following character types or any combination thereof: Kanji, Katakana and Hiragana.

The Similarity parameter defines a value between 0 and 1, which is used to set the required similarity between the query terms and the matching terms. The similarity level is calculated based on the Levenshtein algorithm. For example: For a similarity of 0.5, a term of the same length as the query term is considered similar to the query term if the edit distance between both terms is less than length(term)*0.5.

The Prefix length parameter specifies the number of characters-from the beginning of the search term-that must exactly match in order to return a result. For example: The keyword 'クリニク' will match 'クリニック' if the Prefix length < 4 and Similarity = 0.

FuzzyRussian

Similarity = 0.7

Prefix length = 0

Performs a search on Russian text and finds a similar, or "fuzzy" match.

The Similarity parameter defines a value between 0 and 1 to set the required similarity between the query term and the matching terms. The similarity level is calculated based on the Levenshtein algorithm. For example: For a similarity of 0.5, a term of the same length as the query term is considered similar to the query term if the edit distance between both terms is less than length(term)*0.5.

The Prefix length parameter specifies the number of characters-from the beginning of the search term-that must exactly match in order to return a result.

Japanese

Performs search on Japanese text. This algorithm allows you to use the following character types or any combination thereof: Kanji, Katakana and Hiragana.

Jaro Winker

threshold = 0.7 (a condition to add Winkler distance or not. Value is from 0 to 1)

This algorithm works best on short strings, such as proper names. It tallies the number of characters in common and places a higher emphasis on differences at the start of the string. Therefore, the lower you set the Threshold parameter, the more impact differences at the beginning of strings have. Threshold parameter values should be from 0.0 to 1.0.

Levenshtein

This algorithm works best for short strings where you expect few differences between the keyword and the data being searched. For example, this works well for dialects spoken in a particular part of the country, or by a specific group of people.

NGram

Item size (n) = 2

This algorithm partitions search criteria into subsets of a specified length called NGrams. You set this length using the Gram size property. For example, if you set this property to a value of 3, the algorithm splits the word PHASED into the following N-Grams: PHA, HAS, ASE and SED. PHASED is then added to the lists of words containing those N-Grams.

Keep in mind that if you set the size too small, the algorithm may not capture important differences and return too many terms. If the size is too large, the opposite is true and may result in few returned results. Therefore, when used for names, a value of 3 or 4 is recommended. For phone numbers, a value of 7.

Russian

This algorithm allows you to search text in Russian.

SearchDate

Threshold = 5

This algorithm allows you to search on fields with date or, date-time data types. In order for a date to match, it must be in the range specified by the search input plus/minus the number of days specified in the Threshold parameter. The closer the search input is to the data being searched, the higher the score.

When you increase the Threshold parameter's value, the score decreases: Score = 100-(distance*100/threshold)

SearchNumber

Threshold = 5

This algorithm allows you to search on fields with a numeric data type. In order for a number to match, it must be in the range specified by the search input plus/minus the value set in the Threshold parameter. The closer the search input is to the numbers being searched, the higher the score.

In order for a number to match, it must be in the range specified by the search input plus or minus the value set in the Threshold parameter. If the Threshold value increases, the score decreases. Score = 100-(distance*100/threshold).

Soundex

This phonetic algorithm works best on proper names. It returns similar-sounding words or names by converting the string being searched to a four-character code and returning words with the same code. Note that you cannot use this algorithm to search numeric, date/time, or special characters.

Korean

Perform search on Korean text.

FuzzyKorean

Similarity = 0.7

Prefix length = 0

This algorithm performs a search on Korean text and finds a similar, or "fuzzy" match.

The Similarity parameter defines a value between 0 and 1, which is used to set the required similarity between the query terms and the matching terms. The similarity level is calculated based on the Levenshtein algorithm. For example: For a similarity of 0.5, a term of the same length as the query term is considered similar to the query term if the edit distance between both terms is less than length(term)*0.5.

The Prefix length parameter specifies the number of characters-from the beginning of the search term-that must exactly match in order to return a result

Table 14: Predefined algorithms

Filtering

Filtering record rule

Filtering record rule table (logical name: FilteringRecordRule)

This table is located under the EBX® Administration Tab in the 'Matching reference data' dataset.

Properties

Definition

Name

Name of the filtering record rule.

Description

Context of use.

Java class

Specifies a Java reference to provide the rule implementation.

Is prebuilt

If 'Yes': This function is provided by the add-on.

If 'No': This function is not provided by the add-on.

Table 15: Filtering record rule properties

Filtering field rule

This table allows you to remove field values from a matching operation to improve results. For example, depending on configuration settings, the values 'Holding Company Inc.' and 'Holding Company' may not register as a match. You could specify that the value of 'Inc.' gets filtered out during the operation, which would then result in a match.

Filtering field rule table (logical name: FilteringFieldRule)

This table is located under the EBX® Administration Tab in the 'Matching reference data' dataset.

Properties

Definition

Name

Name of the filtering field rule.

Description

Context of use.

Value filter

The field to exclude during matching.

Java class

Specifies a Java reference to provide the filtering rule.

Two built-in classes are available:

com.orchestranetworks.addon.daqa.RemoveWordFilter - The add-on applies all values specified in the 'Value filter' property to whole words only. Note that using this class may impact performance.

com.orchestranetworks.addon.daqa.RemoveValueFilter - All values configured in 'Value filter' will be applied every time the value is found without taking into account the context.

You can define a custom filter by extending the 'MatchingFieldValueFilter' API.

Is prebuilt

If 'Yes': This function is provided by the add-on.

If 'No': This function is not provided by the add-on.

Table 16: Filtering field rule properties

Matching on several tables

The EBX® Match and Merge Add-on can be configured to match data sourced from several tables using their relationships, namely the foreign keys.

Special notation:

/100002010000000E0000000E3C009B93.png

See also the 'Using matching through relation(s)' property in the matching policy configuration. This property allows the add-on to use indirect FK (including join table) and not only direct FK.

The matching value of a foreign key is its label-in the language of the current EBX® session-rather than its language in the 'Table configuration'.

In the example below, a matching policy is configured to match the name of a 'Party' through his foreign key to the 'Company name' table:

/1000020100000397000000D2ECEBC969.png

The add-on uses the pivot record on the 'Party' table to get the associated pivot record in the 'Company Name' table, then performs matching on the list of company names. All records of the 'Company Name' table with a score higher than the 'stewardship min score' are used to set up a cluster of related 'Party' records.

Here is an example:

Party name

Company name

David

Orchestra

Durand

IZX

Bonnais

Delta

Albert

Delttaz

Bonney

Delttaz

When a match is executed on the pivot record 'Bonney' with a matching policy applied on the company name, then here is the list of suspect records. In this example, the company Delta and Delttaz is considered with a score higher than the minimum threshold to be a suspect.

Bonnais

Delta

Albert

Delttaz

Bonney

Delttaz

When the foreign key label relies on many fields, the add-on uses the average score of these fields.

Matching through relation tables

'Relation matching' allows the add-on to match data through related tables. The matching results on related tables are then used to calculate the score of data on the source table.

Types of relation matching:

Special notation:

/100002010000000E0000000E3C009B93.png

Both the source and target table must be registered with the add-on.

Properties

Definition

Relation record score weight

This property allows you to configure the 'Relation table's matching score weight. The add-on combines this table's score with the 'Table to match' score using a weighted average.

Default value: '1'

Relation record stewardship min score

This is used when the 'Funneling mode' property in the matching policy is active.

If the score of the field is lower than the 'Field stewardship min score' then there is no match.

By default the value is set to 100 to make the threshold inactive.

Default value: '100'

Using join table

If 'Yes': The relation match is configured through a join table between the 'Table to match' and the 'Relation table'.

If 'No': The relation match is configured with a direct relation between the 'Relation table' and the 'Table to match'.

Default value: 'No'

Relation match with join table group

The Relation match with join table group contains the following options:

  • Mode: Specifies the location of the table referenced by the foreign key. The default is Same dataset which implies from the same data model. The Other dataset option corresponds to a different dataset, but in the same dataspace. Other dataspace implies a dataset located in a different dataspace.

  • Join table: Select the join table that links the Table to match with the Relation table. Join tables have at least one foreign key to the table defined in the current Matching policy. These tables and the current table can be located in any dataset or dataspace. Although, if the Mode is set to Other dataset or Other dataspace, the join table must be in the same data model or a the dataspace or dataset containing the relation table. You don't need to register the join table with the add-on.

  • Link: Join table -> Table to match: Specify the foreign key used to link the join table and table defined in the current matching policy.

  • Relation table: Select the relation table directly linked by the join table. These tables and the current table can be in any dataset.

  • Link: Join table -> Relation table: Specifies the foreign key used to link the join table and the relation table.

  • Matching policy on Relation table: Select the matching policy to apply to the relation table.

Please note that if the join table does not come from the same data model as the source table, it must be located in the same dataset as the relation table.

Relation match without join table

The Relation match without join table group contains the following options:

  • Mode: Specifies the location of the table referenced by the foreign key. The default is Same dataset which implies from the same data model. The Other dataset option corresponds to a different dataset, but in the same dataspace. Other dataspace implies a dataset in a different dataspace.

  • Relation table: Select the related table. These tables and the current table can be in any dataset or dataspace.

  • Link: Relation table -> Table to match: Specifies the foreign key used to link the related table and the table to match against.

  • Matching policy on Relation table: Select the matching policy to apply to the relation table.

Table 17: Relation properties

To apply matching through relation(s):

Testing a matching policy

After creating or editing a matching policy, you might want to use the Check similarity service to test whether the policy produces the expected scores. One advantage of testing is that it operates on a small subset of records; you can see results immediately. Also, the Check similarity service does not change the matching metadata for the records involved. Therefore, you can try different configurations without having to update matching states after each test.

The Check similarity service allows you to compare the similarity of two records in the same table. To test, you choose one of the matching policies configured for the table. The policy specifies which fields to compare and the algorithms to use. You can look at the resulting scores to get a preview of how the add-on would handle these records during a matching operation with this policy.

The results returned by the service include the following information for each field:

To run the service:

  1. Navigate to the table containing the records you want to compare and select two records.

  2. From the table's Actions menu, select Match and Merge > Check similarity.

    /Testing_a_matching_policy_1.png

  3. Use the Matching policy drop-down menu to choose the matching policy you want to test and select Launch.

    /check-similarity-2.png

Examples

The sections below show examples of configuring and applying a Matching policy.

Matching policy with exact score

When the configuration literal score = 'Exact' is declared on a matching policy, the business identifier of the table is used as the main criteria for grouping suspect records. This business identifier is not necessarily the primary key, but is one or more fields defined in TIBCO EBX® Match and Merge Add-on → Table → Business ID.

When more than one Business ID field is configured, two records are considered as Suspect if any of the Business ID field values are equal.

The following is an example with an Employee table containing the 'Social number' business identifier. The primary key is a technical object identifier (oid).

Two policies are configured: PO1 and PO2 use the metaphone algorithm on the 'Last name' field, and PO1 is configured with literal score = 'Exact'.

When applying PO1, all suspects share the same business identifier value. PO2, however, does not take the business identifier into consideration as it does not specify literal score = 'Exact'.

/10000000000002C30000023D7C153049.png

The results are different depending on whether PO1 or PO2 is used.

Matching using Join table

In the Article model, 'Supply points' table is the join table of 'Articles' and 'Suppliers' table.

/matching_using_join_table_1.png

In order to apply matching through the Supply points join table , configure as below:

/relational_matching_using_join_table.png

When you modify record '21-Calendar' on Articles table, the add-on firstly finds records (7-Samsung and 3-Samssungg) in the 'Suppliers' table ('Relation table') that link to the modified record via the 'Supply points' table('Join table'). Those records are considered as the Suppliers table's pivots. The add-on then executes matching on those pivots using the Matching policy configured for the Suppliers table. The higher scores between two matches are taken as the final score of records in the 'Suppliers table.

/match_on_relation_table.png

Once matching finishes on the Suppliers table, the add-on performs a reverse look up to find Suspect records (22-calendar and 23-kalendar) in the Articles table based on the foreign key. Matching then executes using the Matching policy configured for the Articles table.

If there is no Matching field is configured, scores of the Suspect records in Articles table are scores of the corresponding records in Suppliers table.

/match_on_relation_result.png

In case 'Reduced label' is configured as the Matching field, the scores of Suspect records on Article are calculated as below:

/match_on_relation_result_with_matching_field_1.png

The add-on then combines results and calculates final score of Suspect records in the Articles table using matching field score weights:

/match_on_relation_result_with_matching_field.png

Smart synonym matching

In the following example, the Articles table is configured to use smart synonym matching with the SmartSynonym - Pennsylvania Avenue synonym group.

/matching_field_with_synonym.png

In the Articles table, there are two records with reduced labels '400 Pennsylvania Avenue' and '400 Penn Ave'. The 'SmartSynonym - Pennsylvania Avenue' group contains two child groups naming 'Pennsylvania' and 'Avenue' as shown below:

/matching_field_with_synonym_2.png

When you execute matching on the Articles table, the add-on checks if any strings of '400 Pennsylvania Avenue' existing in recursive child and parent groups of 'SmartSynonym - Pennsylvania Avenue'. If the string exists, the add-on replaces the original string with the shortest value before matching. '400 Pennsylvania Avenue' now becomes '400 Penn Ave' before matching and after matching returns a final score of 89.9% (max score - 0.1).

/matching_field_with_synonym_result.png

Matching with an alternative field - Surrogate matching

When the matching result on a record is lower than the configured 'Field stewardship min score (%)', you can use another field to match with the current matching field. Make sure that the current matching field is not a complex field, foreign key field, multiple-value field or enumeration field.

In matching field table → Surrogate field → Fields, select alternative fields in the drop-down list of fields which have the same data type as the current matching field in the matching table. You can configure more than one field to be the alternative matching fields.

As the following example, on Article table, executes 'Match table' on the record with 'Identification' 222305:

/surrogate_field_matching_1.png

/surrogate_field_matching_2.png

'Article type' value of record 222304 is used to match with 'Reduced label' value of record 222305. Record 222304 then becomes Suspect with the returned matching score of 89.9%.

/surrogate_field_matching_3.png

All information of the surrogate matching is stored in DaqaMetadata.

/Surrogate_stored_in_Metadata.png

Survivorship policy

Introduction

A survivorship policy is applied to a table in order to define how the add-on performs automatic merging of suspect records into a target record.

Once the survivorship target record is identified by applying a 'survivor record selection mode', the add-on executes an automatic merge from suspect records to this target record based on the following rule: only suspect records with a score higher than the stewardship maximum value defined in the process policy are merged.

The add-on relies on four tables to manage the configuration of survivorship policies:

The relationships between these tables are as follows:

Fields with data types 'value function', 'password' and 'uda' are not merged.

Special notation:

/100002010000000E0000000E3C009B93.png

At execution time, depending on the 'On survivorship' property configured in the 'Process policy', the automatic merge procedure is as follows:

  • If 'On survivorship' = 'No' no automatic merge.

  • If 'On survivorship' = 'Yes' and survivorship configuration is not valid or does not exist, then survivorship record is the pivot record and the fields of suspect records are not merged to the pivot.

Survivorship configuration

Survivorship policy context

Survivorship policy contexts ensure that automatic merge occurs only when field values meet certain conditions. Each context can apply to one or more fields. In turn, you can specify multiple values as conditions to satisfy for each field.

Survivorship policy context table (logical name: SurvivorshipPolicyContext)

Properties

Definition

Code

Any naming convention without white spaces.

Table

Link to a table under the add-on's control.

Name

Name of the survivorship policy context.

Field contexts

Field context: a field in the table used as the context

Use foreign key value: when enabled, the foreign key's value is used instead of its label. This option only displays when the chosen 'Field context' is a foreign key field.

Value: value of the field

For an empty value this constant must be used: osd:is-empty

A multi-valued field cannot be selected as a field context. To use such a field, you need to create a new field with a function to aggregate the values into the excepted format. Then, this field can be used as a field context.

Table 18 :Survivorship policy context properties

Survivorship policy

When defining this type of policy, you can use the Default survivor field to apply conditions for survivorship to all of a table's fields (excluding the primary key and source field). To define these conditions, the add-on allows you to:

When you define conditions in the Default survivor field, a record must satisfy the record must satisfy the function and condition to be survived.

Special notation:

/100002010000000E0000000E3C009B93.png

When several survivorship policies are active at the same time for different contexts, then the policy with no context is used by default for every execution not related to a defined context. If no survivorship policy is declared, then the pivot will be considered as survivor record and no automatic merge of fields is performed.

Survivorship policy table (logical name: SurvivorshipPolicy).

Properties

Definition

Code

Business code of the survivorship policy. Any naming convention is possible.

Short description

Description of the survivorship policy.

Long description

Long description of the policy

Table

Table for which this survivorship policy is defined.

Active

If 'Yes': This survivorship policy is used.

If 'No': This survivorship policy is not used.

At execution time, depending on the 'On survivorship' property configured in the 'Process policy', the automatic merge procedure is as follows:

  • If 'On survivorship' = 'No' no automatic merge.

  • If 'On survivorship' = 'Yes' and survivorship configuration is not valid or does not exist, then survivorship record is the pivot record and the fields of suspect records are not merged to the pivot.

Is context

If 'Yes': This policy is executed for the context defined by the link to the 'Survivorship policy context'.

When creating multiple context-based survivorship policies, you can define a default survivorship policy. When conditions do not meet those specified in the context(s), the add-on uses the default survivorship policy. If you choose to not define a default policy and conditions do not meet those specified in the context(s), the add-on takes no survivorship action.

If 'No': This survivorship policy is not related to any matching context.

Default value: 'No'

Survivorship policy context

Survivorship policy context for which this survivorship policy is defined.

Survivor record selection mode

Rule applied by the add-on to decide which record will be the survivor amongst a set of suspect records: refer to 'Record selection policy'. The add-on still automatically chooses a record based on the current settings if no survivor field is configured.

Note that auto-created records are given the highest priority.

Merge all records in cluster

If 'Yes': The add-on merges all data in the cluster including records with a score of -1. Their states then become 'Merged'. Note that this property is ignored when executing a 'Run survivorship on clusters' operation.

If 'No': The add-on merges data whose matching score against the Golden or Pivot record is higher than or equal to max score. In case of 'Automatic merge', data of Merged (-1) is not merged into Pivot or Golden record.

Default value: 'No'

Default survivor field

Function that will be applied for every attributes of the table (except the primary key that is selected with the 'Survivor record selection mode') to select the value to be survived in the golden record.

The default merge function will be overwritten if a survivor field is defined.

Survivorship function

A function that is applied to the field to select the best value amongst a list of records. These functions take into account any conditions specified in the 'Condition for field value survivorship' property.

Condition for field value survivorship

The condition is defined as a predicate expression. The Survivorship function will be checked first before moving on to Condition for field value survivorship. A record must satisfy both the Survivorship function and the predicate expression to be survived. If the value of this field is empty or invalid, it will be ignored by the add-on. In case, the predicate expression is invalid, it will be logged in the add-on log file.

For example:osd:is-not-null(./AddressLine1) and osd:is-null(./AddressLine2)

Execute if empty only

Decision on whether to apply survivorship function to merge data to Golden/Pivot record or not based on the field value of the survivorship record. If set to:

  • 'Yes': The survivorship function is executed only if the field value in the survivorship record is empty.

  • 'No': The survivorship function is executed even if the field value in the survivorship record is not empty. Default value: 'No'

Table 19: Survivorship policy properties

Survivorship field

Configure field merges.

Survivorship field table (logical name: SurvivorFieldRule).

Properties

Definition

Survivorship field code

Any naming convention without white spaces can be used.

Survivorship policy

A reference to the 'Survivorship policy' table.

Field

A field in the table for which a survivorship function is defined.

Survivorship function

A function that is applied to the field to select the best value amongst a list of records. For example, in an 'Employee' table, a maximum value function can be applied to the age field to automatically select the greatest age amongst the related suspect records. These functions take into account any conditions specified in the 'Condition for field value survivorship' property. For a list of survivorship functions along with their description, please see Predefined survivorship functions.

Condition for field value survivorship

The condition is defined as a predicate expression. The Survivorship function will be checked first before moving on to Condition for field value survivorship. A record must satisfy both the Survivorship function and the predicate expression to be survived.

If the value of this field is empty or invalid, it will be ignored by the add-on. In case, the predicate expression is invalid, it will be logged in the add-on log file.

Executed if empty only

Decision on whether to apply survivorship function to merge data to Golden/Pivot record or not based on the field value of the survivorship record.

If 'Yes': The survivorship function is executed only if the field value in the survivorship record is empty.

If 'No': The survivorship function is executed even if the field value in the survivorship record is not empty.

Default value: 'Yes'

Table 20: Survivorship field properties

Survivorship field function

Survivorship function table (logical name: SurvivorshipFunction).

This table is located under the EBX® Administration Tab in the 'Matching reference data' dataset.

Properties

Definition

Name

The function's name.

Description

The function's objective.

Is prebuilt

If 'Yes': This function is provided by the add-on.

If 'No': This function is not provided by the add-on.

Java class

Specifies a Java reference to provide a custom algorithm.

Table 21: Survivorship function properties

Special notation:

/100002010000000E0000000E3C009B93.png

Currently for survivorship functions, if there are more than one value can be survived, the value of the record with the latest timestamp will be selected by default.

Predefined survivorship functions

The add-on provides predefined survivorship functions to drive the merge at the field level. When no function is defined, the 'Best score' function is used by default.

Survivorship function

Description

Best score

The field value used is that of the record with the best score amongst all related suspects.

Constant

The field value used is set to value defined as constant, whatever merged records values.

Most trusted source

The field value is taken from the record with the best trusted source amongst all related clusters. If there are two sources with the same priority, the add-on will determine the most trusted based on latest timestamp.

If this field is not declared in the 'Field trusted source' table, then the merge for this field is not executed.

Longest

This field value is taken from the record with the biggest field value size (string).

The length of a multi-value field is counted as the number of elements in the list.

Max

The field value used is the maximum value amongst all related suspects.

Min

The field value used is the minimum value amongst all related suspects.

Most frequent

The field value used is the value found most frequently amongst all related suspects. Note that this function ignores the value from an auto-created record.

Most recently acquired

The field value is taken from the last timestamps record.

No merge

The field value is not merged

Table 22: Predefined survivorship functions

Record selection policy

Introduction

A record selection policy defines the method used to select a pivot record for the matching procedure and to select a survivorship record.

Pivot selection

A pivot record selection policy is specified when configuring a matching policy.

By default, the add-on uses the new or updated record as the pivot record for computing the match score. This is called the 'newUpdate' policy.

If a policy other than 'newUpdate' is selected, that policy will be enforced after the 'newUpdate' policy has been applied.

Special notation:

/100002010000000E0000000E86672C34.png

The survivorship will override 'Pivot selection mode'.

Survivorship record selection

When the survivorship process is started, the add-on has to select the record used as the survivorship, or target record. When the record selection policy is 'newUpdate', all suspects to be merged will be merged into the pivot. If another policy is then executed, the pivot is no longer the survivorship record.

Record selection policy

Record selection (logical name: RecordSelection).

This table is located under the EBX® Administration Tab in the 'Matching reference data' dataset.

Properties

Definition

Name

The record selection policy name.

Description

The record selection policy description.

Is prebuilt

If 'Yes': This record selection policy is provided by the add-on.

If 'No': This record selection policy is not provided by the add-on.

Java class

Specifies a Java reference to provide a custom algorithm.

Table 23: Record selection properties

Predefined record selection policies

The add-on provides these ready-to-use selection policies. When a record selection policy does not provide any records, then the policy 'newUpdate' is used by default except if another one is mentioned in the table below.

Record selection policy

Description

Best score

Record with the best score after the pivot record

Longest

Record with most information (size). All fields (string data type) are taken into account except matching metadata.

The length of a multi-value field is the length of longest element in the list.

Most complete

The record with fewest empty fields.

Most complete and longest

Record with the fewest empty fields and with the most information (size). All fields are taken into account except matching metadata. Most complete takes priority over the longest.

Most recently acquired

The record with the latest timestamp managed through the matching metadata tab.

Most trusted source

The record with most trusted source.

New update

The record created or modified.

Was golden

The record is selected amongst 'Was golden' records and current 'Golden'. If many records with the 'Golden' or 'Was golden' identifier exist, the 'Most recently acquired' record is used.

Table 24: Predefined record selection policies

Trusted sources

A source identifies a system that provides data either at the record or field level.

For instance, in a company, the employee's records are provided either by the human resources system (source 1) or a manufacturing system (source 2). For every record, the salary field can be also provided by the accounting system (source 3). Based on these three sources, it is possible to declare trusted levels such as:

The add-on uses the trusted level mechanism in these procedures:

The field stating the source depends on each table under add-on control. The configuration of this field is set in the 'Table' table (logical name: TableConfiguration) through the 'Source' parameter field.

Trusted source configuration

Note

If no specific fields are defined as a trusted source, the add-on applies the trust level set for the table to all of its fields.

Source

Source (logical name: Source).

Properties

Definition

Name of source

Business source name – any naming convention can be used and the add-on compares this value with a record's source field value.

Description

Description of the source

Table 25: Source properties

Table trusted source

Table trusted source table (logical name: TableTrustedSource).

Properties

Definition

Table

A table under the control of the add-on.

Trusted source list

List of sources ordered by trust

Table 26: Table trusted source properties

Field trusted source

Field trusted source table (logical name: FieldTrustedSource).

Properties

Definition

Table

A table controlled by the add-on.

Field

A field in the table controlled by the add-on.

Trusted source list

List of sources ordered by trust

Table 27: Field trusted source properties

Language management

Each table under add-on control is associated with one of the languages defined in 'ebx.locales' (the languages managed by the EBX®). The matching process uses this language.

The language configuration is declared in the table configuration.

Synonym configuration

The configuration of synonyms is available in the 'Semantic' data domain.

When configuring a matching field policy, you can refer to a group of synonyms used for the de-duplication process. The add-on provides some sample synonyms.

Synonym

A synonym is any phrase that will be considered as another's synonym. The synonym items are synonyms with each other within a group of synonyms (refer to the table 'Synonym group'). To attach synonyms to a group, the data hierarchy view 'Synonym by group' is used.

Source (logical name: Item).

Properties

Definition

Code

Any naming convention except the prefix '[ON]' that is reserved for the synonyms provided by the add-on can be used.

Name

The name of the synonym.

Table 28: Synonym

Synonym group

The synonym items that are synonym with each other are grouped into a Synonym group.

Source (logical name: Synonym group).

Properties

Definition

Code

Any naming convention except the prefix '[ON]' that is reserved for the synonyms provided by the add-on can be used.

Name

The name of the synonym group. The 'Synonym by group' data hierarchy view on the 'Synonym' table, is used to attach the items to a group.

Parent group

Groups of synonyms can be arranged through a parent relationship. At the time you use a synonym group for matching a field, you can configure the matching policy to look for synonyms through the parent relation.

Table 29: Populate synonym group

Populate synonym group

This table shows the relationships between synonym items and synonym groups. A synonym group may contain many synonym items.

Properties

Definition

Synonym group

A record of Synonym group table.

Synonym

A record of Synonym table.

Table 30: Populate synonym group

Matching operations

In this section, you can find information on matching operations and their contexts for use. Sections are divided into operations that apply to a single record and to those that apply to a set of records.

Operations applied to a single record

List of operations

Type and context of use

Match cluster

Matching operations applied to a single record are not available if the process policy is incorrectly configured or the 'On matching Process' property is set to 'No'.

EBX® standard services remain available from the EBX® view. It is possible to duplicate, compare and delete a record. Because the deletion is physical, it can entail integrity constraint failures. For example, if a merged record holds a foreign key to the record that has been deleted, then EBX® will raise integrity errors.

You can make the EBX® delete service inaccessible based on user permission levels. This helps to ensure that a logical deletion with the add-on is always used.

Note that if a suspect record is manually removed from a cluster using one of these operations and the cluster contains only pivot and merged records, the pivot becomes golden.

Match table

Match suspicious

Create new golden

Set golden

Set back golden

Set definitive golden

Unset golden

Not suspect

Merge

Automatic merge

Unmerge

Switch pivot

Remove from cluster

Remove and set golden

Add into cluster

Set merged (ignore)

Cancel ignore

Delete

Undelete

Align foreign key of merged record

Push out suspect (-1)

Display meta-data (also available at the table view )

UI action applied to single record.

Manage cluster

Show cluster

Show cluster for merge

Display merged records, Display relation records

Display record

Hide cluster

Modify record

Fix relation records (for pivot and golden when under matching)

Clean up merged fields log

Table 31: List of operations applied to single record

On suspicious

Suspicious records are located into the '004' cluster.

Suspicious

Match cluster

N/A

Match table

Matching against all records in the table with unmatched state (in the '000', not in groups), suspect, pivot and golden. Other states that are not used to execute the match include: to be matched, suspicious, definitive golden, merged and deleted. New scores are computed.

Match best cluster

Matching against Pivot and Golden (with cluster ids > 10). The record will be moved to the cluster that has the best score. The current Pivot or Golden is still prioritized as Pivot and Golden. A Match table operation will be initiated if no Pivot or Golden record matches the record.

Match suspicious

Simple matching against all records in the table with unmatched state (in the '000' cluster, not in groups), suspect, pivot and golden. Other states that are not used to execute the match include: to be matched, suspicious, definitive golden, merged and deleted. New scores are computed. The suspicious record is then either associated to one to many potential suspect records or directly considered a golden record.

Create new golden

N/A

Set golden

Record is declared as golden and moves to the '001' cluster.

Set back golden

N/A

Set definitive golden

Same as 'Set golden' but the record moves to the '003' cluster.

Unset golden

N/A

Not suspect

N/A

Merge

N/A

Automatic merge

N/A

Unmerge

N/A

Switch pivot

N/A

Remove from cluster

N/A

Remove and set golden

N/A

Add into cluster

The record becomes a suspect record. It is integrated into a selected cluster along with its merged records and its score is computed against the pivot or golden record. Regardless of the result of this score, the record is kept in the cluster.

If there is no pivot or golden record in the selected cluster, then the new record becomes the pivot record. If a pivot already exists, the new record is put into the suspect state. If a golden record already exists, the golden record becomes the pivot record and the new record is put into the suspect state. No automatic merge is executed.

It is not possible to add the record into a cluster that is used to group unmatched (to be matched) records.Note that if you move a record other than Merge into a new cluster, its state will be changed to Golden.

Set merged (ignore)

N/A

Cancel ignore

N/A

Delete

Set to deleted and remains in the current cluster.

Undelete

N/A

Align foreign key

N/A

Push out suspect (-1)

N/A

On suspect

Suspect

Match cluster

N/A

Match table

Matching against all records in the table with unmatched state (in the '000' cluster, not in groups), suspect, pivot and golden. Other states that are not used to execute the match include: to be matched, suspicious, definitive golden, merged and deleted. New scores are computed.

Match suspicious

N/A

Create new golden

N/A

Set golden

The record is declared golden. All other records in the cluster (except merged and deleted) are marked as merged without a merge operation. The score is recomputed for each record in the cluster against the new golden record. Records are kept in the cluster.

Set back golden

Available only if 'Was golden'= 'Yes'. The record becomes golden and is placed into the '001' cluster without any matching or merging. If the Auto create new golden for single golden property is active, the add-on creates a new golden record and updates the current record to merged. The remaining records in the impacted cluster will be fixed.

Set definitive golden

Same as 'Set golden' but the record moves to the '003' cluster. The current cluster can have only merged records since the golden record has moved to the '003' cluster.

Unset golden

N/A

Not suspect

Record is no longer a suspect against the pivot or golden record. The record becomes unmatched and is moved to the '000' cluster.

Merge

Record is merged into the pivot using selected fields in the suspect record (selected in the merge UI). One record is merged at a time. No survivorship rules are applied.

Automatic merge

N/A

Unmerge

N/A

Switch pivot

The current record becomes the pivot. The former pivot becomes a suspect record. New scores are recomputed for all records in the cluster against the new pivot. These records are kept in the cluster regardless of the new results (score='-1' is no match). Automatic merge is not executed.

Remove from cluster

The record is removed from the cluster, set to unmatched, and placed into the '000' cluster. The cluster may have a pivot with no suspects.

Remove and set golden

The record is removed from the cluster, set to golden, and placed into the '001' cluster. If the Auto create new golden for single golden property is active, the add-on creates a new golden record and updates the current record to merged. The remaining records in the impacted cluster will be fixed.

Add into cluster

N/A

Set merged (ignore)

The record is set to a merged state without performing any merge operations. This is one way of rejecting records (ignore). The link to the target record is set to null.

Cancel ignore

N/A

Delete

The record is set to deleted and remains in the current cluster with its current score. If there are no remaining Suspects in the cluster, the Pivot becomes Golden.

Undelete

N/A

Align foreign key

N/A

Push out suspect (-1)

N/A

On pivot

Pivot

Match cluster

Matching against suspect records in the cluster. No automatic merge. Suspects are kept in the cluster regardless of the score ('-1' if the minimum threshold to be considered as a suspect).

Match table

Matching against all records in the table with unmatched state (in the '000' cluster, not in groups), suspect, pivot and golden. Other states that are not used to execute the match include: to be matched, suspicious, definitive golden, merged and deleted. New scores are computed.

Match suspicious

N/A

Create new golden

A new pivot record is created if all primary key fields are auto-incremented or done manually from a pop-up. It replaces the existing pivot and the existing pivot becomes a suspect. The Merge view automatically displays.

Set golden

The record becomes golden. All other records in the cluster (except merged and deleted) are set to merged without any merge. Score is recomputed for each record in the cluster against the new golden record.

Set back golden

Available only if 'Was golden'= 'Yes'. Record becomes golden and is placed into the '001' cluster without any matching or merging. The pivot then becomes the most recently updated record. New scoring is executed. All existing suspect records are kept in the cluster whatever the score.

If the Auto create new golden for single golden property is active, the add-on creates a new golden record and updates the current record to merged. The remaining records in the impacted cluster will be fixed.

Set definitive golden

Same as 'Set golden' but the record moves to the '003' cluster. The current cluster can have only merged records.

Unset golden

N/A

Not suspect

N/A

Merge

N/A

Automatic merge

All suspect records in the cluster are merged into the pivot using defined survivorship policies. The record becomes a golden record.

Unmerge

The EBX® Data history function must be active on the table.

Value changes revert to that of the pivot record before its last merge, but only if the last merged record's timestamps are still equal to the pivot record timestamp. If these timestamps differ, unmerging is no longer possible. Merged records become suspect records and the pivot record remains in its state.

Switch pivot

N/A

Remove from cluster

N/A

Remove and set golden

N/A

Add into cluster

N/A

Set merged (ignore)

N/A

Cancel ignore

N/A

Delete

The record's state is set to deleted, the record remains in the current cluster. The cluster can have no pivot. Every suspect is set to '-1'. If On suspect record retention is activated:

  • And there is only one Suspect, the Suspect becomes Golden.

  • And there are multiple Suspect records, the add-on selects a new Pivot and executes a Match cluster operation to recalculate the score.

Undelete

N/A

Align foreign key

Realign foreign keys of merged records of the pivot in the dataspace.

Push out suspect (-1)

All suspect (-1) records of the cluster move to the unmatched cluster. If there are no remaining Suspect record, the Pivot becomes Golden.

On golden

Located either in non-predefined clusters or in '001' (golden) or '003' (definitive golden).

Golden

Match cluster

N/A

Match table

Matching against all records in the table with unmatched state (in the '000' cluster, not in groups), suspect, pivot and golden. Other states that are not used to execute the match include: to be matched, suspicious, definitive golden, merged and deleted. New scores are computed. If matching does not find a potential duplicate record, the golden record remains in its current cluster.

Match suspicious

N/A

Create new golden

N/A

Set golden

N/A

Set back golden

N/A

Set definitive golden

Available only if the record is not in the '003' cluster. The record moves to the '003' cluster. The current cluster (if different from '001') can have only merged records since the golden has moved to the '003' cluster.

Unset golden

Record becomes pivot. All other merged records in the cluster have 'Target record' field points to the Golden are set to suspect and the target record is set to 'null'. For a golden record in the '001' and '003' clusters, the record moves to the '000' cluster as unmatched. For golden from '003' former merged records are set to suspect with a score set up to '-1'.

Unset golden recursively

Record becomes pivot. All other merged records in the cluster are set to suspect with scores before being merged.

Not suspect

N/A

Merge

N/A

Automatic merge

N/A

Unmerge

The EBX® Data history function must be active on the table.

Changes the value back to that of the golden record before its last merge, but only if timestamps of the last merged records are still equal to the timestamp of the golden record. If these timestamps differ, unmerging is no longer possible. Merged records become suspect records and the golden record becomes a pivot record.

Switch pivot

N/A

Remove from cluster

N/A

Remove and set golden

N/A

Add into cluster

Record becomes the new pivot in the cluster, match cluster is executed, all suspects are kept and merge is not executed. Its merged records are also moved to the new cluster without changing states. It is not possible to add the record in a cluster that is used to group unmatched (to be matched) records. Note that if you move a golden record into a new cluster, its state will remain the same.

Set merged (ignore)

N/A

Cancel ignore

N/A

Delete

The record's state is set to deleted and the record remains in the current cluster with its current score. The cluster can have merged records only.

Undelete

N/A

Align foreign key

Realigns foreign keys of the merged records in the dataspace.

Push out suspect (-1)

N/A

On merged

A merged record is located in a cluster different from reserved clusters ('000' to '010').

Merged

Match cluster

N/A

Match table

N/A

Match suspicious

N/A

Create new golden

N/A

Set golden

N/A (even if Was golden = yes)

Set back golden

N/A

Set definitive golden

N/A

Unset golden

N/A

Not suspect

N/A

Merge

N/A

Automatic merge

N/A

Unmerge

N/A

Switch pivot

N/A

Remove from cluster

N/A

Remove and set golden

N/A

Add into cluster

The record is integrated into a selected cluster a long with its merged records without changing states and scores.

Set merged (ignore)

N/A

Cancel ignore

Available only if the link to the target record is null. The record becomes a suspect record and remains in the current cluster with its current score.

Delete

N/A

Undelete

N/A

Align foreign key

N/A

Push out suspect (-1)

N/A

On unmatched

An unmatched record is located in the '000' (unmatched) predefined cluster or located in a normal cluster through the 'Group at once unmatched' operation.

Unmatched

Match cluster

N/A

Match table

Matching against all records in the table with unmatched state (in the '000' cluster, not in groups), suspect, pivot and golden. Other states that are not used to execute the match include: to be matched, suspicious, definitive golden, merged and deleted. New scores are computed.

Depending on the record scores in the table, the unmatched record may move to another cluster, including the '001' cluster if it directly becomes a golden record.

Not available for unmatched records that have been grouped by the 'Group at once unmatched' operation.

Match best cluster

Matching against Pivot and Golden (with cluster ids > 10). The record will be moved to the cluster that has the best score. The current Pivot or Golden is still prioritized as Pivot and Golden. A Match table operation will be initiated if no Pivot or Golden record matches the record.

Match suspicious

N/A

Create new golden

N/A

Set golden

The record becomes golden and moves to the '001' cluster.

Set back golden

N/A

Set definitive golden

The record becomes golden and moves to the '003' cluster.

Unset golden

N/A

Not suspect

N/A

Merge

N/A

Automatic merge

N/A

Unmerge

N/A

Switch pivot

N/A

Remove from cluster

N/A

Remove and set golden

N/A

Add into cluster

The record becomes a suspect record. It is integrated into a selected cluster a long with its merged records and its score is computed against the pivot or golden record. Regardless of the result of this score, the record is kept in the cluster.

If there is no pivot or golden record in the selected cluster, then the new record becomes the pivot record. If a pivot already exists, the new record is put into the suspect state. If a golden record already exists, the golden record becomes the pivot record and the new record is put into the suspect state. No automatic merge is executed.

It is not possible to add the record to a cluster that is used to group unmatched (to be matched) records. Note that if you move a record other than Merge into a new cluster, its state will be changed to Golden.

Set merged (ignore)

N/A

Cancel ignore

N/A

Delete

The record's state is set to deleted; the record remains in the current cluster.

Undelete

N/A

Align foreign key

N/A

Push out suspect (-1)

N/A

On deleted

A deleted record can be located in any cluster.

Deleted

Match cluster

N/A

Match table

N/A

Match suspicious

N/A

Create new golden

N/A

Unmatch

N/A

Set golden

N/A

Set back golden

N/A

Set definitive golden

N/A

Unset golden

N/A

Not suspect

N/A

Merge

N/A

Automatic merge

N/A

Unmerge

N/A

Switch pivot

N/A

Remove from cluster

N/A

Remove and set golden

N/A

Add into cluster

N/A

Set merged (ignore)

N/A

Cancel ignore

N/A

Delete

N/A

Undelete

The record is set to unmatched state in the '000' cluster with the score '-1'.

Align foreign key

N/A

Push out suspect (-1)

N/A

On to be matched

A record that is ready to be matched is located in the '002' predefined cluster, or located in a normal cluster through the 'Group at once to be matched' operation.

To be matched

Match cluster

N/A

Match table

Matching against all records in the table with unmatched state (in the '000' cluster, not in groups), suspect, pivot and golden. Other states that are not used to execute the match include: to be matched, suspicious, definitive golden, merged and deleted. New scores are computed.

Depending on the scores of the records in the table, the record in the 'To be matched' state may move to another cluster including the '001' cluster if it directly becomes a golden record.

Not available for 'To be matched' records that have been grouped by the 'Group at once unmatched' operation.

Match best cluster

Matching against Pivot and Golden (with cluster ids > 10). The record will be moved to the cluster that has the best score. The current Pivot or Golden is still prioritized as Pivot and Golden. A Match table operation will be initiated if no Pivot or Golden record matches the record.

Match suspicious

N/A

Create new golden

N/A

Unmatch

N/A

Set golden

N/A

Set back golden

N/A

Set definitive golden

N/A

Unset golden

N/A

Not suspect

N/A

Merge

N/A

Automatic merge

N/A

Unmerge

N/A

Switch pivot

N/A

Remove from cluster

N/A

Remove and set golden

N/A

Add into cluster

N/A

Set merged (ignore)

N/A

Cancel ignore

N/A

Delete

The record's state is set to deleted and it remains in the current cluster.

Undelete

N/A

Align foreign key

N/A

Push out suspect (-1)

N/A

UI operation applied to single record

UI operation applied to single record

Description

Display metadata

Opens a pop-up window showing record metadata values. This service is also available at the table view level.

Manage cluster

Opens the cluster with all its records.

Show cluster

Displays all records contained in the cluster in the bottom window.

The bottom window is a cumulative view. One to many clusters can be displayed.

Show cluster for merge

Displays all records contained in the cluster in the bottom window so that the 'Merge' button is readily available.

Clean up the list of 'Not suspect with'

Removes all records in the metadata tab's 'not suspect with' property. You can only see this service when at least one record holds the 'not suspect' designation with the current record (except Merged and Deleted).

Display merged records

Displays all merged records.

Fix relation records

Allows you to fix the relation records

Clean up merged field log

Removes all merged field information logged in Metadata. This service is only visible when at least one record merged into the current record (except Merged and Deleted).

Align foreign key of merged records

Updates foreign key references from Merged records to Pivot or Golden records. This service can be used to align the foreign key of a recursive merged record if the 'Was golden' field value is set to 'True' on the target record.

Hide cluster

Empties the bottom window.

Modify record

Opens the record tabular view.

Display relation records

When a relation match is configured, this service allows you to display the 'Relation table' records.

Fix relation records

The sections below show how to fix related records and which operations to apply.

Enabling the fixing of relation records

When a table holds the matching metadata type, the 'Fix relation records' service is available. This service allows you to arrange and clean records that are considered 'relation records' to this table across dataspaces and datasets. These records hold a direct or indirect (through a join table) relation to this table.

For instance, a join table 'EmployeeCompany' is used to define the relations between employees and companies. Once the 'Employee' table has been extended with the matching metadata type, the 'Fix relation records' service becomes available in the light and full matching view menu (see below, Stewardship UI).

/fix_relation_service.png

When the table is under the matching process the 'Fix relation records' service is available for records with 'Pivot' and 'Golden' matching states only. The add-on process is then fully applied. When the table is not under the matching process, the 'Fix relation records' service is available for all matching states. In this situation the add-on process is no longer applied.

Operations to fix the relation records

You can apply one of these actions to the "relation record":

Not suspect with

The 'Not suspect' service is only available for suspect records.

Once this service is executed on a record, the record is no longer considered as a suspect compared to its pivot or golden record. This has an effect not only for the current match procedure, but for future ones as well.

The add-on keeps a list of non-suspect records for each pivot and golden record. The 'Not suspect with' matching metadata is populated automatically by the add-on.

Records can manually be removed from this list. Records must only be added using the 'Not suspect' service.

This is an example of the 'Not suspect' mechanism.

/10000000000002F60000008E8776D135.png

The records 3, 4 and 5 will never be considered as a suspect against record 1 during matching. Record 4 is now a pivot, but was previously a suspect against record 1.

When record 1 subsequently changes states, it maintains its 'Not suspect with' list.

'Not suspect with' lists are not modified when referenced records are modified or logically deleted. When a physical deletion is executed then the list is updated to avoid the risk of breaking integrity.

The score of the record against its 'Not suspect with' record is kept by the add-on so that future scoring can be compared with previous scores. If the difference between a former score and a new score is higher than the 'notSuspectComeBackSuspectThreshold' value (Process Policy) then the record is removed from the 'Not suspect with' list and its state returns to suspect.

Operations applied to a set of records at once

Types of operations

Type and context of use

List of operations

These operations are not available when a process policy has an incorrect configuration. However, they remain usable when the 'On matching Process' is set to 'No'.

Match at once (suspicious)

Match at once (to be matched)

Match at once (unmatched)

Exact match at once (to be matched)

Exact match at once (unmatched)

Group at once (to be matched)

Group at once (unmatched)

Group collapse (to be matched)

Group collapse (unmatched)

Align foreign key of all merged records

Table 32: List of operations applied to a set of records at once

/list_of_table_services.png

'Unmatched' records are created either when matching is deactivated for a table through the 'On matching process' parameter in the table configuration, or when the 'Remove from cluster' operation is performed.

'To be matched' records are created when importing bulk data. Then, a policy should be configured with the 'Is import mode' property set to 'Yes'. This allows the add-on to set every imported record to 'To be matched'.

'Suspicious' records are created when the simple matching configuration is used.

Match at once

Four operations are available for launching matches on a set of records at once:

For the 'unmatched' and 'to be matched' records, two execution levels are available:

Special notation:

/100002010000000E0000000E3C009B93.png

It is highly recommended to configure a special matching policy to use for the 'Match at once' operation ('For match at once' property in the 'Matching policy' table). If this policy is not configured, then the matching policy will be selected based on the context of records (the field values configured in table 'Matching policy context'), it can entail defect in the matching results.

In case the 'Record creation matching state' and 'Record modification matching state' indicators of the EBX® Insight Add-on are configured, their automatic execution may affect 'Match at once' performance.

Match by states

By default, matching is applied solely on the scope unmatched records (to be matched, unmatched or suspicious depending on the operation).

/match_by_state.png

Using a 'base' configuration allows you to define from which record states the 'Match at once' operation is driven. For example, if the base is 'Golden', then golden records are matched against the unmatched (or to be matched, suspicious) records. The order of the selected states is significant since matching is executed for the first state, then the second, etc.

The 'Maximum number of match per state' parameter is used to limit the number of records matched for each state configured in the base. It allows you to avoid executing the 'Match at once' operation on all existing records in the table.

Special notation:

/100002010000000E0000000E3C009B93.png

The 'Match at once' operation ignores the records that are located in the groups.

Example of match at once

List of operations

Match at once (unmatched) with base = golden, unmatched

All golden records are matched against all unmatched records. Once this match is terminated, the rest of the unmatched records are matched with each other.

Match at once (unmatched) with base = golden

Same as the previous example but the rest of unmatched records are no longer matched.

Match at once (unmatched)

All unmatched records are matched with each other.

Table 33: Examples of match at once operations

Special notation:

/100002010000000E0000000E3C009B93.png

The 'Match at once' operations handle the records to be matched as a stack of records. In other words, a record is matched only one time during the whole process.

/100002010000000E0000000E3C009B93.png

To streamline the process of the 'Match at once' operation, it is recommended to:

  • Use only one matching algorithm level, meaning that the 'Second matching algorithm' property in the Matching policy used for the match at once operation, should be set to 'not defined'.

  • Deactivate the automatic merge process by setting the Process policy's 'On survivorship' property to 'No'.

  • Deactivate the 'Not suspect with' feature by setting the Process policy's 'On not suspect with' property to 'No'.

  • Use 'Full mode' with caution due to the exponential match operation execution: n*(n-1) match execution for n records.

Match by groups

It is possible to execute the 'Match at once' operation on a subset of unmatched records (or to be matched) that have been collected by the 'Group at once' operations.

/match_by_groups.png

Special notation:

/100002010000000E0000000E3C009B93.png

The match by groups option is available if there are existing groups (see the 'Group at once' operation to create groups of records).

When a large set of records must be matched, it is recommended to apply a rough matching policy to create groups. This requires a shorter execution response time. For example, using an Exact algorithm on a 'postal code' field to group customers by department. After that, the 'Match at once' operation is executed on the groups as explained above.

Special notation:

/100002010000000E0000000E86672C34.png

Parallel matching on several groups.

Match at once (on clusters)

You can execute the 'Match at once' operation on a single cluster or a set of clusters. Only clusters that have a ClusterID greater than 11 and do not contain a golden record display in the list of clusters.

/match_at_once_on_clusters.png

Special notation:

/100002010000000E0000000E86672C34.png

Matching policy for 'Match at once' is not applied when executing 'Match at once (on clusters)'.

Match with the 'Full mode' option

When you execute the 'Match at once' service and enable the 'Full mode' option, a match is run against every record in the table, regardless of any existing 'Match at once' configuration settings. This means that any property values which make up a "base" configuration are ignored. The number of matches executed is n*(n-1) for n involved records. This improves matching result relevance, but increases execution time.

In 'Full mode', the 'Force orphans golden' option is active as default. In this mode, all lone suspect or lone pivot records in a cluster are forced to golden.

When this option is not active, a record is matched only if its state value is within the 'Match at once' base configuration's parameters.

Using match at once without the 'Full mode' option

First step of the service

A, Unmatched

B, Unmatched

C, Unmatched

D, Unmatched

Match at once on A. The A record is a pivot and matched with B, C, and D. If a record becomes Suspect it is no longer used for the match. This is considered a "stack mechanism" to facilitate rapid matching. Applying an additional match on a Suspect record could entail new duplicates that are ignored in the current 'Match at once' operation.

Second step of the service (executed automatically)

A, Pivot

B, Suspect

C, Unmatched

D, Unmatched

The next match is done on C. The B record is not used to launch a match. This stack process streamlines the response time and results in a fast matching process.

Using match at once with the 'Full mode' option

First step of the service

A, Unmatched

B, Unmatched

C, Unmatched

D, Unmatched

Match at once on A. In 'Match at once' with the 'Full mode' option enabled all records in the table are matched one time and only one time, even if they are suspect, pivot or golden. The base configuration is not used.

Second step of the service (executed automatically)

A, Pivot

B, Suspect

C, Unmatched

D, Unmatched

Next match is done on B by applying the service 'Match table' on the suspect.

Special notation:

/100002010000000E0000000E86672C34.png

The 'Unmatched' records created during match at once with 'Full mode' option are matched against other records in the new cluster.

Match at once in 'Parallel mode'

When you activate Parallel mode, the add-on divides records into groups and executes matching on these groups simultaneously. This method can reduce matching response time. The options when running Match at once in parallel mode are defined in 'Process policy' table's 'Options for Parallel Match at once' tab. Please refers to Process policy section for more details. Note that the Parallel mode is hidden when replication is activated (only in commit mode).

/match_at_once_in_parallel.png

Note

The parallel matching option does not display on a historized table, or an existing replication table on submit mode.

Special notation:

/100002010000000E0000000E86672C34.png

Match at once parallel does not work on a historized table or a table which is configured a 'On submit' mode replication.

/100002010000000E0000000E3C009B93.png

Parallel matching is in beta mode, it is highly recommended that you work on a working dataspace instead of real data.

Follow the result of a match at once execution

The 'Clean all previous markers' parameter is used to retrieve records that have been handled by the match at once execution.

/get_list_records_by_match_at_once.png

When a record's state is modified during a 'match at once' operation, the add-on flags it. Then, it is possible to get the list of all records impacted by match at once operation execution(State button).

A record is considered as stemming from a 'match at once' operation until it is modified by any other operations or if the 'Clean all previous makers' option is set to 'Yes' when executing a 'match at once' operation (figure below).

/clean_all_previous_markers.png

Response time

When a table contains a large set of records, it is recommended to apply an incremental matching procedure (see Incremental matching procedure section). The following table describes three properties that have an impact on a match at once operation's response time.

Property

Description

Max number of records in cluster

(Table configuration)

When a set of records contains many suspect records, setting the 'Max number of records in cluster' property to a higher value improves response time.

When the 'Max number of records in cluster' property is set lower and the 'Maximum number of match table for each state' is higher, a match operation creates more clusters containing fewer records.

Stewardship min score

(Process policy)

The lower this threshold is, the faster the match at once operation will be. The pace of feeding clusters depends on the ability to make a record suspect as quick as possible.

If this threshold is too high it means that likelihood of a record to be a suspect is lower. This configuration can raise many matches of the same record against many other records in the table because of a lack of suspect detection.

Maximum number of match table for each state

('Match at once' operation)

The match at once operation is executed as many times as records are available to match. When an unbounded value is used for this property, it means that all records in the table will be impacted by the operation.

Table 34: Properties having an impact on the response time of match at once operations

This table gives some examples of possible response times for the match at once operation. The assumptions are as follows:

Note: obviously the response time depends on the quality of the execution platform as well. For this reason the following table remains just an example.

Property

Example

Max number of records in cluster=5

Stewardship min score=15%

Maximum number of match table for each state=10

Quick match to get an idea for response time on a limited matching coverage

Result in: 5 seconds

Max number of records in cluster=5

Stewardship min score=15%

Maximum number of match table for each state=100

Higher number of records to match. The response time should be about 10 times the previous one

Result in: 40 seconds

Max number of records in cluster=5

Stewardship min score=15%

Maximum number of match table for each state=10,000

For 10,000 records it should take about one hour (40 seconds * 100)

Max number of records in cluster=50

Stewardship min score=15%

Maximum number of match table for each state=100

With the 'Max. number of records' property set to a higher number, suspect record collection time improves.

Example: 3 minutes

Max number of records in cluster=50

Stewardship min score=15%

Maximum number of match table for each state=10,000

The same configuration as the previous one but applied to 10,000 records should take about 5 hours

(3 minutes * 100)

Table 35: Example of possible response time for a table with 100,000 records

Incremental matching procedure

Because there are many parameters that influence the 'Match at once' service execution time (see Response time section), it is recommended to run matching on successive subsets of records. This is achieved on all records directly in the table, or by creating groups of records (operations Group at once).

No use of groups of records

The following table shows an example for the execution of a 'match at once' operation applied on a set of 400,000 records.

Records

Matching Execution

/10000201000000F9000000BEEF38470D.png

Matching stage #1

All table records are set to 'unmatched' when importing data for the first time. On this table, a 'match at once (unmatched)' operation is executed with a limit of records set to 100,000. The stewardship for these records is then applied. Some of these records move to merged and can be deleted. Other are set to golden, pivot and suspects.

/10000201000000F7000000BD58C6D22A.png

A second 'match at once (unmatched)' operation is executed on the next 100,000 records. Only records with an unmatched state are used, meaning that all records that have been analyzed during the first matching execution are ignored.

/10000201000000F9000000BDA7154908.png

Matching stage #2

After the execution of four 'match at once (unmatched)' operations by sets of 100,000 records, there are 150,000 records holding merged and deleted states. To execute a second matching stage, all other records must be set to 'unmatched' (see the 'set at once' service). When applied to the 250,000 remaining records, the second matching execution stage is performed. The 'match at once (unmatched)' operation is directly executed on the scope of all 250,000 records, or on a smaller set depending on the matching strategy.

Table 36: Example of an incremental matching procedure

Use of groups of records

Rather than matching 100,000 records directly (see No use of groups of records section) with a fine grained matching policy, it is recommended to first apply a rough matching policy that groups all unmatched (or to be matched) records without changing their states. This rough policy can rely on the fastest matching procedure such as an Exact algorithm with one field only.

Once groups are created, then the match at once operation can be executed on every group to keep the response time to a minimum.

To create a group the 'Group at once' operations are available to apply to 'unmatched' or 'to be matched' records (see next section).

Synthesis of possible approaches to match large volumes of data

Different types of match

The same process can be applied to 'to be matched' and 'Suspicious'.

/10000000000004970000033D8C49A924.png

The whole process of matching large set of data

The same process can be applied to 'to be matched' and 'Suspicious'.

/10000000000004A900000338E68CD58E.png

Special notation:

/100002010000000E0000000E3C009B93.png

When you are dealing with a large volume of data, it is also possible to use the 'Exact match at once' operation to apply a direct data query in order to get groups of records based on exact data values (see Exact match at once section).

/100002010000000E0000000E3C009B93.png

You can filter data at the time the matching policy is executed. Fuzzy matching is then applied to the subset of records retrieved through the filter. Please see the 'Filter by' property.

Exact match at once

The 'Exact match at once' operation is executed on 'To be matched' or 'Unmatched' records.

This operation applies a full 'exact' matching based on the matching fields that are configured in the active matching policy. This means that the algorithms configured in the matching policies are bypassed to apply an exact matching.

Similar to the 'Exact' algorithm in the matching field configuration, the 'Exact match at once' operation, runs a direct data query that is not based on a fuzzy algorithm. Therefore the response time is dramatically improved but the matching results cannot be mixed with a fuzzy search. When many fields are configured to use matching then the 'Exact match at once' operation computes the record's score by using a direct 'and' operator.

The 'Exact match at once' operation creates clusters of records that match.

The records used to match come from the cluster reserved for the 'To be matched' records (or 'Unmatched') and the records that are pivot, suspect and golden. In other words, 'To be matched' and 'Unmatched' records that are already grouped are not used. The suspicious records are not used.

/exact_match_at_once_to_be_matched.png

The 'In memory' option forces the execution of the matching in a full memory mode for which all data is managed within a unique transaction. It improves the response time dramatically (more than ten times faster) but requires more memory use. This option is strongly recommended for large volume of data with million of records.

At the end of the execution, you can still have 'To be matched' or 'Unmatched' records when they have empty values for the exact matching fields or they are located already in existing groups.

Special notation:

/100002010000000E0000000E3C009B93.png

'Exact match at once' is a fast matching operation that can be used on large set of records to easily create initial clusters of records that share exact values on certain fields.

/100002010000000E0000000E3C009B93.png

The 'Exact match at once' operation does not work in conjunction with the 'Relation match' operation. With a direct foreign key relationship, the exact match is applied using the foreign key raw value not its label.

/100002010000000E0000000E3C009B93.png

Even if a matching policy includes a 'Filter by' configuration, the 'Exact match at once' operation ignores it. This is because 'Exact match at once' already behaves as a filter.

/100002010000000E0000000E86672C34.png

The 'Exact match at once' operation does not support matching on multi-valued fields.

Group at once

The 'Group at once unmatched' operation ('to be matched') allows you to group unmatched or 'to be matched' records. Contrary to matching execution, records that are grouped do not move to another state such as golden, pivot or suspect. They keep their states unmatched (to be matched).

Special notation:

/100002010000000E0000000E3C009B93.png

The maximum number of records in a group is defined by the 'Max number of records for group' property in the 'Table' configuration. It is recommended that this property is configured to maintain manageable groups sizes that depend on the context of use. For example, to have a chance to group 100,000 records in a minimum of 100 groups, the 'Max number of records for group' property must be set to '1,000'.

Groups creation

It is possible to configure a special matching policy for group at once execution ('For group at once' property in the Matching policy table). If this policy is not configured, then the default is used.

Most of the time, the group at once operation is used to create groups of records on which a further execution of the match at once operation will be applied. Then, the groups can be created from a rough matching policy (Exact algorithm, one matching field) that will streamline the execution response time.

The group at once operation takes into consideration all unmatched records (to be matched) in the table, whatever their cluster localization ('000' for unmatched, '002' for to be matched), including records already grouped. In other words, the execution of the group at once operation reshapes the existing groups for unmatched or to be matched.

/group_at_once_unmatched.png

The 'Maximum number of matches for group' property allows you to limit group at once operation match execution.

Groups collapse

It is possible to collapse a group by using the 'Group collapse (unmatched)' operation (to be matched). Then, all group records move to the reserved cluster '000' (unmatched) or '002' (to be matched) and the group dissolves.

/group_collapse_unmatched.png

Note: the number of records in a group is computed at the time of group creation. In order to keep the original information about the group size, it is not updated after the creation.

Follow the result of a group at once execution

The 'Clean all previous markers' parameter is used to retrieve records that have been handled by the match at once execution.

/get_all_unmatched_in_group.png

When a record is grouped during a 'group at once' operation, the add-on flags it. This allows you to retrieve a list of all records impacted by a group at once operation (button State, operations 'Unmatched in group' and 'To be matched in group').

A record is considered as stemming from a 'group at once' operation until it is modified by any other operation, or if the 'Clean all previous markers' option is set to 'Yes' when executing a 'group at once' operation (figure below). Then, a special indicator displays as highlighted below.

/grouped_special_indicator.png

Align foreign keys of all merged records

Any record that still has a relationship with another record after it has been merged has an integrity issue. Since merged records still remain in their tables, built-in validation performed by EBX® cannot be used to resolve this issue.

The EBX® Match and Merge Add-on provides two services to realign foreign keys:

These services check every foreign key in relation to the table managed by the add-on, including ones in the table on which the service is executed (in case of self-referencing relationships). The scope of these services includes all related dataspaces and datasets.

When a foreign key is linked to a merged record, the add-on automatically updates it with the correct target record in the appropriate dataspace and dataset. To achieve this process, the service relies on the matching metadata to keep track of the target record for each merged record. Note that the setting of a table's Relationship management property can affect this behavior as follows:

To verify whether the orphan relationships have been fully repaired in a repository, all obsolete merged records must be purged. Once these records are no longer physically present in the tables, built-in validation performed by EBX® can be relied upon to identify remaining orphan relationships.

As the 'Align foreign key' services can no longer be used once merged records have been purged, you can employ a child dataspace for EBX® validation checking. In the child dataspace, it is then possible to purge the merged records in isolation and correct any validation errors reported by EBX®. Once the results have been validated in the child dataspace, the corrections alone can be merged back into the parent dataspace, thereby leaving merged records untouched in the parent dataspace allowing 'Align foreign key' to be run again in the future.

Logical deletion of records

The add-on relies on a logical deletion of records (deleted state). From the add-on's point of view, it is better to keep the record deleted until a definitive decision for its physical deletion is made. At any time, a deleted record can be reset to unmatched if needed (undelete).

When a record moves to the deleted state, integrity is a concern if other records hold a link to this record. Since the record is not yet physically deleted, the EBX® validation procedure does not raise errors.

To enable EBX® to enforce the validation you need to physically remove all records with the deleted state. This can be achieved either by using the usual EBX® 'delete' service, or by executing the 'Purge at once (deleted)' operation supplied by the add-on.

Running survivorship on all clusters

From the Table services menu, you can execute the Run survivorship on clusters service to run survivorship on selected clusters. Only clusters that have pivot, golden and merged records will be taken into account. When you run this service, the add-on executes survivorship to merge data from merged records to golden, or pivot records.

Running a match from a table

You can execute a match from a table's Actions menu by selecting the Run match service. This service allows you to select one or more records to include in a matching operation. If you run the service without making any selections, matching runs on the entire table.

When you run the service and:

In the Run match screen, the Active selection field shows the number of currently selected records. You can use the Record to match against field to define which records the add-on will process against the active selection. The Specific states option allows you to select from available states. The matching operation then runs on records with the selected state. When you choose the Active selection option, the match runs only actively selected records (against each other). The Max no. of selected records property limits the number of records from the active selection that will be processed.

The following image shows the Run match screen:

/run-match-screen.png

Setting record state

The add-on allows you to set the matching state for records as follows: