A 'Crosswalk' operation allows you to create a cross-reference table based on a matching policy executed over multiple tables. For instance, the 'Client', 'Party' and 'Customer' tables have duplicate records and you want to create a cross-reference table that shows how the same record is represented in multiple tables. This feature is not used to merge the records but provides a convenient place to view the linked records. The 'Records linking analysis' service executes this function.
To enable the Crosswalk feature for a table, its Use crosswalk property needs to be set to Yes. This property is available in the Table domain located in the TIBCO EBX® Match and Merge Add-on dataspace under the Administration tab. You also need to configure a Crosswalk process policy and the related matching policies and matching fields. Once configuration is complete, you can execute the Records linking analysis service. This service is available from the Actions drop-down menu when viewing a table that has the Crosswalk feature enabled—as highlighted in the image below. The service can be applied either on the whole table or on selection.
The 'Crosswalk Policy' domain is used to configure the 'Records linking analysis' function applied to a table. It is available from the TIBCO EBX® Match and Merge Add-on dataspace located under the EBX® 'Administration' tab.
Without having to modify a Crosswalk matching field, you can use a workflow script task to declare a target dataspace for the 'Records linking analysis' service. There, you can define such parameters as: source dataspace, source dataset, source records, source filter class, target dataspace, target dataset, and an output parameter to store the results. The operation uses Tables and Fields that were earlier defined in the configuration. If the target dataspaces and datasets have not been registered with the add-on, the corresponding dataspaces and datasets in the configuration will be retrieved and used instead.
To specify a target dataspace using workflow script task, you must follow the following steps:
Configure a Crosswalk matching field.
Configure workflow script task [ebx-addon-daqa] Records linking analysis.
Define data context.
Then publish and launch workflow to finish.
The 'Target dataspace' field allows you to specify one, or enlist all the dataspaces that have been registered in the configuration. If you declare more than one dataspace, use a comma to separate them. Additionally, you must define the dataspaces in the same order as the configuration. For example: DataSpace1, DataSpace2, DataSpace3, etc. If you want to skip one, or more dataspaces, do not declare the dataspace in its intended position. The following example skips DataSpace2: DataSpace1, ,DataSpace3.
If you leave the 'Target dataspace' field undefined, the script uses the dataspace registered in the configuration. Please refer to the images below for examples.
If both the 'Source record(s)' and 'Source filter class' fields are not defined, the 'Records linking analysis service' runs on all records in the table. Note that the 'Source record(s)' has higher priority than 'Source filter class'.
A crosswalk process policy defines a set of parameters used to execute the 'Records linking analysis' service. Multiple crosswalk policies can be defined on a table, but only one can be active at a time.
Source (logical name: CrosswalkProcessPolicy).
Properties | Definition |
---|---|
Crosswalk process policy code | Any naming convention without white space can be used. |
Description | The description of crosswalk process policy. |
Source table | Table on which the crosswalk process policy is defined. |
Active | If set to 'Yes': This crosswalk process policy is used. If set to 'No': This crosswalk process policy is not used. Default value: 'Yes' |
Match in source table | Determines if the crosswalk policy can match records within the source table or not. In an example configuration, a 'Client' table is the source table and the 'Records linking analysis' service is configured to match against the 'Customer' table. In this instance, if the 'Match in source table' property is set to 'No', the returned results will show only records considered duplicated between the two tables. If you want to also check for duplicates within the 'Client' table itself, then the 'Match in source table' property must be set to 'Yes'. |
Keep previous result | If set to 'Yes': The previous results of the 'Records linking analysis' service are kept in the result tables ('Crosswalk' and 'Crosswalk additional result' tables). If set to 'No': The previous results of the 'Records linking analysis' service are replaced by the new execution results (located in the 'Crosswalk' and 'Crosswalk additional result' tables). Default value: 'No' |
Maximum number of results | The maximum number of results saved in the 'Crosswalk additional result' table. When no value is defined, all results are saved. Default value: '20' |
Stewardship min scope (%) | When the matching score is lower than this threshold then the record is not considered a match. |
Threshold second matching (%) | When the score computed by the first algorithm is between 0% and the value specified by the 'Threshold second matching(%)' property, then the second algorithm is applied to detect potential false negative records. For example, if 'Threshold second matching' is set to '5%' then the second algorithm executes each time the score of the first algorithm is lower than '5%'. Constraint: 'Threshold second matching' is lower than or equal to 'Stewardship min score'. This property is used when 'Second matching algorithm' in the crosswalk matching policy is not empty. Default value: '0' |
2 nd level stewardship min score (%) | When the second matching level is configured to find false negative records from a first level matching, then this threshold is used to evaluate the matching score and whether or not it provides a suspect record. This property is used when 'Second matching algorithm' in the crosswalk matching policy is not empty. |
Table 44: Crosswalk process policy
A crosswalk matching policy is used to configure how matching executes to detect duplicate records and then issues the records linking analysis.
Source (logical name: CrosswalkMatchingPolicy).
Properties | Definition |
---|---|
Crosswalk matching policy code | Any naming convention without white space can be used. |
Description | The description of crosswalk matching policy. |
Source table | Table on which the crosswalk matching policy is defined. |
Active | If set to 'Yes': This crosswalk matching policy is used. If set to 'No': This crosswalk matching policy is not used. Default value: 'Yes' |
Main matching algorithm | The default algorithm for all crosswalk matching fields that use this crosswalk matching policy. If required, the main matching algorithm can be changed at the field-level (see table 'Crosswalk matching field'). Selection of a main algorithm is mandatory. |
Second matching algorithm | The algorithm used to manage records that could be false negative results after main algorithm execution. Used after the first matching procedure, this algorithm ensures that no matching records have been missed. You cannot override this matching algorithm at the field-level. |
Threshold matching | These threshold matching values are not mandatory. They are used instead of the corresponding values that are defined at the crosswalk process policy level. If one field is provided, then the add-on will use all of them rather than the values specified at the crosswalk process policy level. In this case, the three values become mandatory. |
Stewardship min score (%) | If undefined, the value configured at the Crosswalk process policy level is used. When the matching score is lower than this threshold, the add-on does not consider the record a match. |
Threshold second matching (%) | If undefined, the value configured at the Crosswalk process policy level is used. When the score computed by the first algorithm falls between 0% and the value specified by the 'Threshold second matching(%)' property, then the second algorithm is applied to detect potential false negative records. For example, if 'Threshold second matching' is set to '5%' then the second algorithm is executed each time the score of the first algorithm is lower than '5%'. Constraint: 'Threshold second matching' must be lower than or equal to 'Stewardship min score'. This property is used when the 'Second matching algorithm' property in the crosswalk matching policy is not empty. |
2 nd level stewardship min score (%) | If undefined, the value configured at the Crosswalk process policy level is used. When the second level of matching is configured to seek false negative records from the first matching, then this threshold is used to evaluate if the matching score provides a matched record or not. This property is used when the 'Second matching algorithm' property in the Crosswalk matching policy is not empty. |
Table 45: Crosswalk Matching Policy
Configuration of the fields used for the crosswalk matching execution time. A 'Crosswalk matching field' is attached to a 'Crosswalk matching policy'.
Source (logical name: CrosswalkMatchingField).
Properties | Definition |
---|---|
Crosswalk matching field code | Any naming convention without white space can be used. |
Crosswalk matching policy | Reference to the 'Crosswalk matching policy' table for which the crosswalk matching field is defined. |
Field | A field in the table that is declared by the referenced Crosswalk matching policy. This field is used during the matching procedure. If the field value is empty at execution time, then matching is not enforced. It is possible to match inside a list. Two lists of values will match when at least one value is exactly the same in the two lists. For instance (John, Carl, Paul) will match with (Frank, Theo, John). |
Crosswalk fields | This part of the configuration allows you to declare the fields of the target tables that must be matched with the 'Field' declared in the previous property. The records linking analysis will provide the cross reference between the record of the main table and the tables of the crosswalk fields declared here. When using a table in more than one Crosswalk matching field, it must occupy the same location in each field. For example, a field from a Party table may be defined first in one Crosswalk matching field. If you use a different field from the Party table in another Crosswalk matching field it must also hold the first position. |
dataspace | dataspace where the target table is located. After selecting your desired dataspace, you must click 'Save' to make options available in 'dataset', 'Target table' and 'Field' below. |
dataset | Date set where the target table is located. |
Table | The target table. |
Field | The target field, also known as 'Crosswalk field'. |
Matching algorithm | By default, the algorithm used is the one defined as the main algorithm in the 'Crosswalk matching policy' table. This default value can be changed to select another algorithm. |
Score weight | When several fields are used for matching, the score weight property gives a weight to each field score that the add-on uses to compute a weighted average. Example: '0.5' means the score of the field is worth half (50%). If a matching field is empty then the weighted average does not take into account this field. Default value: '1' |
Use synonym group | A group of synonyms can be selected. If a field value in the suspect record does not match the pivot record, then a new matching is achieved with the synonym values rather than the initial value in the suspect field. The first positive match is used to get the final score. |
Check synonym in all groups | If set to 'true': When matching uses the synonym mechanism, all child synonym groups are used to look for a synonym. If set to 'false': When matching uses the synonym mechanism, the synonyms are looked up within each separate child synonym group. |
Table 46: Crosswalk Matching Field
To select target tables in the list of configured tables under 'Crosswalk fields' group. You can configure a set of crosswalk fields at once and then select the target table from 'Record linking analysis' screen before running crosswalk. The add-on matches the source table against the selected target tables.
A dedicated dataspace Match and Merge - Crosswalk result is used to store the results of running the 'Records linking analysis' service.
The first twenty linked records are saved in a flat data structure in the 'Crosswalk' table that is easy to query. Beyond these results, an additional table is used to save one record per result: 'Crosswalk additional result'.
This table is used to save the first twenty records linking results in a flat data structure that is easy to query.
Source (logical name: Crosswalk).
Properties | Definition |
---|---|
Execution date | Date of the 'Records linking analysis' execution. |
User | User in charge of the execution. |
Crosswalk process policy | The crosswalk process policy's identifying 'Code' that has been used for these results. |
Crosswalk matching policy | The crosswalk matching policy's identifying 'Code' that has been used for these results. |
Source table | Link to the source table used for the 'Records linking analysis' service. |
Source record | Link to the source record for which the 'Records linking analysis' service has been executed. The primary key value is displayed. |
Record linking 01 | |
Table | Link to the target table having a cross-reference with the source table. |
Record | Link to the target record having a cross-reference with the source record. The primary key value is displayed. |
Score | Matching score of the target field against the source table field. |
Table 47: Crosswalk
This table is used to save any additional results—beyond the initial twenty—from running the 'Records linking analysis' service. Every result is saved in one record.
Source (logical name: CrosswalkAdditionalResult).
Properties | Definition |
---|---|
dataspace | dataspace where the target table is located. |
dataset | dataset where the target table is located. |
Table | Table that has a cross-reference with the source table . |
Record | Link to the target record having a cross-reference with the source record. The primary key value is displayed. |
Score | Matching score of the target field against the source table field. |
Source record | Foreign key to corresponding source record in 'Crosswalk' table. |
Table 48: Crosswalk additional result