Each record is associated with a well-defined state that indicates its match level: Suspicious, Suspect, Pivot, Golden, Merged, Unmatched, To be matched and Deleted.
Operational systems (transaction, business intelligence, reporting, etc.) should only be supplied with Golden records as input. Only these records are considered to have no risk of duplication with regards to other information contained in the dataset.
The set of available states is not extensible beyond the given eight states. It is a finite list governed by the add-on to carry out matching functions.
The following sub-states have been added: 'Was golden, 'Definitive golden', 'Ignore' (merged without target), 'Not suspect with', 'Under workflow', 'From match at once', 'From group at once' and 'Merge'.
The add-on can be configured either to execute direct or simple matching. Interaction with each State depends on the type of matching used.
When direct matching is configured, the new or updated record's state is computed directly by the add-on. For example, the record can be considered as a golden record directly, or a pivot with related suspect records.
When simple matching is configured, then every new or updated record is matched by the add-on but the decision regarding its state requires human action. The add-on moves the record into an intermediate 'Suspicious' state. However, if the creation or modification of a record results in a matching score that is compliant with the level of a golden record, then the add-on puts the record into a golden state and bypasses the human decision.
The automatic merge is not applied when the simple matching configuration is used. Indeed, a human decision is mandatory to manage the suspicious record.
Record level matching (State) | Definition | Example |
---|---|---|
Suspicious | A record that likely has duplicate records. The add-on can be configured to suggest that the record is a suspicious record and needs to be verified. A suspicious record has not been classified as a golden, pivot or suspect record. A user task is required to decide the suspicious record's next state: golden, pivot (to be merged with suspect records) or deleted (see the definition of the 'Pivot' and 'Suspect' states). This only applies when the add-on is configured for 'Simple matching'. 'Suspicious' records are systematically attached to the '004' cluster. They are not attached to a specific cluster identifier with a set of suspect records, since suspicious records are not yet considered as pivot records. | A user creates the employee record 'Mooree' that has two employees as potential suspect records: Moore and Zooree. The 'Mooree' record is the suspicious record and a user decision is needed to decide if it becomes a golden record directly (the suspicious record being considered as a legitimate unique record), or if the record creation is canceled (the user determines that the suspicious record already exists, please refer to definition of 'Deleted' state). In these cases there is no change applied on the Moore and Zooree records. But the user can also decide that Mooree is a duplicate and to set it as a pivot against Moore and Zooree for subsequent merge (Stewardship task). In this situation these records are changed to suspect records automatically (refer to definition of state Pivot and Suspect), based on the user's decision. |
Suspect | A record that is potentially a duplicate against a record with either a 'Pivot' or 'Golden' matching state. The add-on computes a similarity percentage score against the pivot or golden record. Then, a set of suspected related records is grouped into a cluster with a unique identifier. | The employee 'Mooree' is a suspect towards 'Moor' with a similarity percentage level of 80%. Moor is the record considered as the pivot or golden record. |
Pivot | A record that is likely to be the best record to use amongst a set of suspect records. The similarity percentage of a pivot record is systematically set to 100%. | Group of records
Another group of records
|
Golden | A record that is either a unique record to be used directly, or the best record to use amongst a set of former suspect records. A former suspect record is one that has since been merged into the golden record. This merge can include specific fields only, use the entire suspect record or reject the suspect record (that is, ignore it). Operational systems (transaction, business intelligence, etc.) should only be given golden records as input. The similarity percentage of a golden record is systematically set to 100%. | Helen, golden, 100% Bonnet, golden, 100% Group of records
|
Merged | A record that has undergone a merging procedure into a pivot or golden record. A merged record keeps track of its targeted record. If a merged record is moved to a new cluster, its target record is updated to the new cluster's golden/pivot record. | Group of records
|
Unmatched | A record that is not yet matched. Its similarity percentage is not relevant and is thus set to '-1'. An unmatched record is attached to the predefined '000' cluster or located in a normal cluster through the 'Group at once unmatched' operation. This state is used when a table under add-on control is temporally deactivated (see the 'On matching Process', 'On creation' and 'On modification' properties in the Process policy configuration). | Jonnhy, unmatched, -1 Jhonny, unmatched, -1 Petrian, unmatched, -1 |
To be matched | A record that is not yet matched but is waiting for a match. This state is used when importing data. During the import all records default to this state. Then, a service allows you to match all 'To be matched' records at once. Its similarity percentage is not yet relevant and thus is set to '-1' by default. 'To be matched' records are attached to the '002' predefined cluster or located in a normal cluster through the 'Group at once to be matched' operation. | Dupand, to be matched, -1 Johnway, to be matched, -1 |
Deleted | A record that has been logically deleted. This is a logical deletion. | Dobbon, deleted |
Table 50: Level of matching - state record definition
Record level matching (Sub-state) | Definition | Example |
---|---|---|
Was golden | Boolean value A record is 'Was golden'='Yes' when it was a golden record before its current state. | Time 01, creation of a direct golden record
Time 02, after a match table
|
Definitive golden | A definitive golden is no longer used when a match against the table is performed. Definitive golden records are located in the '003' predefined cluster. | Time 01, direct golden creation and set up to definitive golden by a user:
Time 02 and after:
|
Ignore | A record that is set as not relevant for merging. The ignored record moves to the merged state with no target value (no actual merge). | Harry, pivot, 100% Harrryy, merged, 80%, target=Harry Barry, merged, 60%, target=null In this example the record Barry has been ignored (state merged with target=null). |
Not suspect with | List of records against which this record is not considered a suspect record during subsequent matching operations. | Bonnet, 'Not suspect with'='Monnet, Monney' Means that the Bonnet record will no longer be considered as a suspect against the Monnet and Monney records. |
Under workflow | A record that is under workflow control. | Bannet, Suspicious, workflow=management of suspicious records |
From match at once | A record whose latest modification has been done by a 'match at once' operation (operation applied to a set of records). | First stage: Pannet, Dannet, Ganet are all unmatched records After executing 'match at once unmatched records' the add-on keeps in memory that these records have been modified by the match at once operation: Pannet, pivot Dannet, suspect, 50% Ganet, suspect, 30% As soon as an operation is performed on any record, then it is no longer considered as a 'From match at once' record. |
From group at once | A record that has been grouped in a cluster with the 'Group at once (unmatched)' service or 'Group at once (to be matched)' service. | Cluster 233, Jonnhy, unmatched, -1 Cluster 233, Jhonny, unmatched, -1 Cluster 233, Petrian, unmatched, -1 |
Merge | When a record has been merged, this sub-state indicates whether the merging has been done manually by a user, or automatically by the system (survivorship). | Cluster 111, Bonnet, golden Cluster 111, Mozzet, merged, User Cluster 111, Bonnnet, Auto |
Table 51: Level of matching - sub-state record definition