Defining Match Cases

The Custom Netrics Query is a sample for the Custom Query which is implemented on Customer Repository model. To understand the problems with data matching, Netrics came up with MatchCase, QueryLet, Grouping and Penalty scores functionality. Using this features Customer model has been implemented. The weights can be configured and all the usecases can be managed.

A common matching problem involves matching entities where there is a set of information describing the entity that can be split into categories. For example, the standard customer model is used for the record matching persons uses case. The customer model matching query uses the following categories:

  1. Name
  2. Date of Birth
  3. Gender
  4. ID (SSN)
  5. Phone Number
  6. Email Address
  7. Address
For each category a QueryLet is defined that produces a match strength score for that category. The problem arises while trying to combine the category match strength scores into an overall match score. The standard AND combiner is not sufficient to express the complexities of determining a match based on the match strengths of each category. There are two primary reasons for this:
  1. Scores are not comparable: A match strength score of 0.85 for an Address may represent a fairly good match for a street address, but a match strength score of 0.85 for an SSN, phone number or email address probably represents a fairly poor match. Basically, we want to ascertain that the two records represent the same value. Different categories are likely to have different match strength criteria before we can say they are probably the same value.
  2. Categories are not interchangeable: Some categories are required to determine a match, others are almost irrelevant, or play only supporting roles. In the example, it does not matter how well the other categories match, if there is no Name or ID match, you cannot say the two records represent the same person.

The current AND structure can capture some of this complexity through the use of the ignore scores, reject scores and querylet weighting features. But it cannot deal with scores are not comparable scenario, nor can the current constructs capture all of the aspects of merging category scores to determine the likelihood of a match. Something we need a construct that can directly express the human matching process.

The following match cases are defined for each category:
  • Case 1:Name and DOB as core querylets.
  • Case 2: Name and ID as core querylets.
  • Case 3: Name and Email as core querylets.
  • Case 4: Name and Address as core querylets.
  • Case 5: Name, DOB and ID as core querylets.

For each match cases, six QueryLets are defined

  • Query Let 1 - Name: Values defined for the Firtsname, Last Name, Middle Name and NameSuffix is considered as one QueryLet.
  • Query Let 2 - DOB: Values defined for the Date of Birth is considered as one QueryLet.
  • Query Let 3 - ID's: Values defined for the NationalReferenceNumber is considered as one QueryLet.
  • Query Let 4 - Email: Child record values of the Email is considered as one QueryLet.
  • Query Let 5 - Address: Child record values of the Adrees like street pincode, and so on is considered as one QueryLet.
  • Query Let 6 - Phone: Child record values of the PhoneNumber is considered as one QueryLet.

Based on the record data the weights are applied and results returned. This results are shown in the matcher workitem. The Customer Query is as follows

And(
    Person.first = “<Person.first>”,
    Person.last = “<Person.last>”,
    Or(
       Email1.address = “<Email1.address>”,
       Email2.address = “<Email2.address>”,
       Email3.address = “<Email3.address>”),
    Or(
       Phone1.number = “<Phone1.number>”,
       Phone2.number = “<Phone2.number>”),
    Or(
       And(
           Address1.street = “<Address1.street>”,
           Address1.city = “<Address1.city>”,
           Address1.state = “<Address1.state>”),
       And(
           Address2.street = “<Address2.street>”,
           Address2.city = “<Address2.city>”,
           Address2.state = “<Address2.state>”),
       And(
           Address3.street = “<Address3.street>”,
           Address3.city = “<Address3.city>”,
           Address3.state = “<Address3.state>”),
       And(
           Address4.street = “<Address4.street>”,
           Address4.city = “<Address4.city>”,
           Address4.state = “<Address4.state>”)
    )
)