Cognate Query
A cognate query is a more complex version of a simple query. It uses the same string comparison algorithm but performs the match with multiple query strings. For instance, a first and last name can be searched against the first and last name fields of the database.
This aids in finding transposed fields (where a first name occurs in the last name field or vice versa).
The difference between two or more simple queries combined with an AND and a cognate query is most clear when there are blocks of repeated text in the query. Assume that the query is for the name John, Johnson, and a record contains the name Frank, Johnson.
With two simple queries, both matched against first and last name fields, first we would consider matching John against Frank, Johnson and find a reasonable match since Johnson resembles John. Then we would consider matching Johnson against Frank, Johnson and find a perfect match. But when computed as a cognate query, the first name John is permitted to match against Johnson since this section of the record is already claimed by the better match with Johnson. In the case of the combined simple queries the last name field of the record is matched twice, creating a higher record score, but one that is not valid in this context. Cognate queries are used when you are matching a set of fields that comprise a single value against a similar set of fields, but have no assurance that data has been placed in the correct field.
This is called a cognate query because every query string has a unique cognate or corresponding field. Thus there is a one to one correspondence between query strings and the selected field set for the query. This implies the length of the LPAR_BLKARR_SEARCHQUERY array must always equal the length of the LPAR_STRARR_FIELDNAMES array. The LPAR_DBLARR_QFIELDWEIGHTS array must also be the same length. Query strings are preferentially matched against their cognate field but might be matched against any field in the field set. LPAR_DBL_NONCOG_WGT is a weighting factor or penalty applied when a value from a query string is matched to a non-cognate field. It must be a value between 0.0 and 1.0 inclusive.
Thus in the example of the first and last name match above if a LPAR_DBL_NONCOG_WGT value of 0.8 was given and a query of John, Smith was matched against a record of Smitty, Johnson there would be a nearly perfect (normal score) match of John to Johnson and Smith to Smitty, but because the matches are against non-cognate fields (first to last, last to first) the score would be penalized by a factor of 0.8.
In some cases a field in a cognate match is empty. The most common example is the middle name when matching first, middle, and last name fields in English-speaking countries where middle names are often not used. When matching one name against another, if one has a middle name and the other does not, the match should not be penalized for the unmatched middle name. Because the cognate query allows for the misfielding of data, it is not valid to look only at the middle name field and adjust scores for not matching that field. The key thing to focus on is a difference in the number of populated fields. When there is a difference in the number of populated fields, the LPAR_DBL_COGNATE_EMPTY_PENALTY LPAR adjusts the penalty applied to the unmatched data in the “extra” populated fields. The penalty is multiplied by the LPAR_DBL_COGNATE_EMPTY_PENALTY value. A value of 1.0 applies the full penalty for unmatched data in the extra populated fields; a value of 0.0 applies no penalty for the unmatched data, and a value of 0.1 applies only one tenth as much of a penalty. For examples and more detailed information, see the TIBCO Patterns Concepts Guide.
A cognate query is specified as a LPAR_LST_COGQUERY which contains the following parameters:
|
•
|
LPAR_BLKARR_SEARCHQUERY specifies the query strings to be used for this comparison. |
|
•
|
LPAR_INTARR_SELECTFLDS selects the record fields against which this querylet is compared. Fields are specified by number with this option. Note that it is impossible to reference a Variable Attribute value using field numbers. |
|
•
|
LPAR_STRARR_FIELDNAMES selects the record fields against which this querylet is compared. Fields are specified by name with this option. The field name might include a Variable Attribute qualifier (see Variable Attributes - Usage Details for details). |
|
•
|
LPAR_DBLARR_QFIELDWEIGHTS (optional) specifies the weight for matched text against each field in the LPAR_INTARR_SELECTFLDS or LPAR_STRARR_FIELDNAMES array. The maximum weight is 1.0, with values less than 1.0 representing penalized matches. |
|
•
|
LPAR_DBL_INVALID_DATA_SCORE (optional) is the score to return for a record if either a query or a record field has invalid data. This supersedes on a query by query basis the value set in the search parameters. There is no such thing as invalid text data so this option is not currently relevant to cognate queries and appears only to maintain consistency with other query forms. |
|
•
|
LPAR_DBL_EMPTY_DATA_SCORE (optional) is the score to return for a record if either a query or a record field set is empty. This supersedes on a query by query basis the value set in the search parameters. |
|
•
|
LPAR_BOOL_MATCH_EMPTY controls behavior when a query string and the data it is being matched to are both empty. If true, a 1.0 score is generated. If false, the empty-score is used. |
Default: false
|
•
|
LPAR_INT_SORTSCORE (optional) sets the score type that is considered to be the match score for this query. This supersedes on a query by query basis the value set in the search parameters. |
|
•
|
LPAR_STR_THESAURUSNAME (optional) is the thesaurus to use for just this querylet. If LPAR_STR_THESAURUSNAME is specified LPAR_LST_THESAURUS might not be. |
|
•
|
LPAR_LST_THESAURUS (optional) defines a thesaurus to use for just this querylet. If LPAR_LST_THESAURUS is specified LPAR_STR_THESAURUSNAME might not be. The list must consist of two lpars, they being the thesaurus_options and thesaurus_data arguments to the lkt_create_thesaurus command. The thesaurus defined by this argument is created and exists only for the duration of this query and is local to this query. See Thesaurus Matching for more information about defining a thesaurus and TIBCO Patterns servers' thesaurus support in general and Ephemeral Thesauri for more information on these ephemeral thesauri in particular. |
|
•
|
LPAR_DBL_THESAURUSWEIGHT (optional) is a penalty factor that is applied to every thesaurus substitution. This penalty factor is applied for all thesaurus types, but only if there was a thesaurus match between query and record of two different terms (e.g. a match of Peggy to Margaret, but it does not apply to a match of Peggy to Peggy). If a combined thesaurus is being used the penalty factor defined in the thesaurus for the class is multiplied by the factor provided here to obtain the final penalty factor. |
|
•
|
LPAR_DBL_NONCOG_WGT (optional) is a penalty factor to be applied to transposed field matches. |
|
•
|
LPAR_DBL_COGNATE_EMPTY_PENALTY (optional) The adjustment to the penalty applied for unmatched data in extra fields. The default value is 1.0 (full penalty). |
|
•
|
LPAR_LST_QOPTS (optional) is a list of detailed tuning parameters which control how the TIBCO Patterns servers scores matches. It is extremely rare that any of these values need to be changed and should be changed only in consultation with your TIBCO representative. |
|
•
|
LPAR_STR_CHARMAP (optional) is the name of the character map to be applied to the query values. By default all fields referenced in a simple query must have the same character map. If a query is to be performed across fields that use two or more different character maps then this option must be supplied to specify which map is to be used. |
|
•
|
LPAR_STR_QLETGROUP (optional) assigns a group name to this querylet. Group names are used by the GIP prefilter to improve the selection of child records in joined searches. See the “Matching Compound Records and Querylet Grouping” section in the TIBCO Patterns Concepts Guide for more information on querylet grouping and when to use it. |
Note: Setting LPAR_STR_CHARMAP option might result in invalid or incorrect results. Consult your TIBCO representative before using this option.