Defining Match Cases
The Custom Netrics Query is a sample for the Custom Query which is implemented on Customer Repository model. To understand the problems with data matching, Netrics came up with MatchCase, QueryLet, Grouping and Penalty scores functionality. Using this features Customer model has been implemented. The weights can be configured and all the usecases can be managed.
A common matching problem involves matching entities where there is a set of information describing the entity that can be split into categories. For example, the standard customer model is used for the record matching persons uses case. The customer model matching query uses the following categories:
- Scores are not comparable: A match strength score of 0.85 for an Address might represent a fairly good match for a street address, but a match strength score of 0.85 for an SSN, phone number or email address probably represents a fairly poor match. Basically, we want to ascertain that the two records represent the same value. Different categories are likely to have different match strength criteria before we can say they are probably the same value.
- Categories are not interchangeable: Some categories are required to determine a match, others are almost irrelevant, or play only supporting roles. In the example, it does not matter how well the other categories match, if there is no Name or ID match, you cannot say the two records represent the same person.
The current AND structure can capture some of this complexity through the use of the ignore scores, reject scores and querylet weighting features. But it cannot deal with scores are not comparable scenario, nor can the current constructs capture all of the aspects of merging category scores to determine the likelihood of a match. Something we need a construct that can directly express the human matching process.
For each match cases, six QueryLets are defined.
- QueryLet 1 - Name: Values defined for the Firtsname, Last Name, Middle Name and NameSuffix is considered as one QueryLet.
- QueryLet 2 - DOB: Values defined for the Date of Birth is considered as one QueryLet.
- QueryLet 3 - IDs: Values defined for the NationalReferenceNumber is considered as one QueryLet.
- QueryLet 4 - Email: Child record values of the Email is considered as one QueryLet.
- QueryLet 5 - Address: Child record values of the Adrees like street pincode, and so on is considered as one QueryLet.
- QueryLet 6 - Phone: Child record values of the PhoneNumber is considered as one QueryLet.
Based on the record data the weights are applied and results returned. This results are shown in the matcher work item. The Customer query is as follows:
And( Person.first = “<Person.first>”, Person.last = “<Person.last>”, Or( Email1.address = “<Email1.address>”, Email2.address = “<Email2.address>”, Email3.address = “<Email3.address>”), Or( Phone1.number = “<Phone1.number>”, Phone2.number = “<Phone2.number>”), Or( And( Address1.street = “<Address1.street>”, Address1.city = “<Address1.city>”, Address1.state = “<Address1.state>”), And( Address2.street = “<Address2.street>”, Address2.city = “<Address2.city>”, Address2.state = “<Address2.state>”), And( Address3.street = “<Address3.street>”, Address3.city = “<Address3.city>”, Address3.state = “<Address3.state>”), And( Address4.street = “<Address4.street>”, Address4.city = “<Address4.city>”, Address4.state = “<Address4.state>”) ) )