Query Combiners

Query combiners, also known as query expressions, are used to combine multiple input scores into a single score. LKT_QEXPR_AND takes the simple average of the input scores, LKT_QEXPR_OR takes the maximum score, and LKT_QEXPR_RLINK performs a much more sophisticated weighted combination under the control of a TIBCO Patterns machine learning model. The query structure used must be the same as was used to train the TIBCO Patterns Learn model. Your TIBCO technical representative can provide more information on how to set up a query for the RLINK score combiner.

With release 4.2 the special query expression LKT_QEXPR_RECWGT was added. Unlike the other expressions this modifies a single score rather than combining multiple scores. It multiplies the single score by the value of a field in the record, which must be a Float field type, to produce a new score for the record (resulting scores are constrained to the range 0.0 to 1.0). This provides a way of increasing or decreasing record priorities in matching.

A query expression is specified as a LPAR_LST_QEXPR which contains the following parameters.

LPAR_INT_QEXPR_TYPE specifies the type of the query combiner (e.g. LKT_QEXPR_AND).
LPAR_LST_QEXPR_ARGS is a list of the score generators (or combiners), the output of which is combined or modified by this expression. The elements of this list is typically the Simple, Cognate, and Predicate score generators described above.
LPAR_DBLARR_QUERYLETWEIGHTS (optional) specifies the weight for each score being combined in this expression. The maximum weight is 1.0, with values less than 1.0 representing penalized matches. For instance, using the weights in an AND combiner is equivalent to a weighted average.
LPAR_STR_QLETGROUP (optional) assigns a group name to this querylet. Group names are used by the GIP prefilter to improve the selection of child records in joined searches. See the “Matching Compound Records and Querylet Grouping” section in the TIBCO Patterns Concepts Guide for more information on querylet grouping and when to use it.
LPAR_LST_QOPTS (options) contains any general query options for this expression.

Currently the available members of this list by expression type are:

LKT_QEXPR_AND: LPAR_DBLARR_IGNORE_SCORES is an array of cutoff scores for each sub-query of this query expression. The length of the array must equal the number of sub-queries in the expression. The values must be either -1.0 (the special reject record score), -2.0 (a special value to indicate no ignore cutoff processing is to be performed for this sub-query) or be in the range 0.0 - 1.0. This provides a means of ignoring sub-queries with poor matches or sub-queries with empty or invalid data. Any sub-query with a score at or below the corresponding score in this array is ignored completely. So if you have three sub-queries with scores: 1.0, 0.7, 0.1 and an ignore scores array of: 0.1, 0.1, 0.1. Then the score would be 0.85 instead of 0.6. To ignore sub queries on empty or invalid data set the LPAR_DBL_EMPTY_DATA_SCORE and LPAR_DBL_INVALID_DATA_SCORE for the sub-query to -1.0 and set the associated ignore scores array value to -1.0.
 
Warning: Ignore scores should be used with extreme caution. Ignoring a low score in one sub-query would likely boost the score of a record with a poor match on that sub-query (For example, a score below the threshold) above the score of a similar record with a good match on that sub-query (For example, above the threshold). Thus when used inappropriately ignore score thresholds tend to push poorer matches above better matches. This is especially true if ignore thresholds are applied to more than one sub-query. Thus ignore thresholds should only be applied in the rare cases where a low score indicates the sub-query is completely irrelevant and the record should score above close but imperfect matches on the sub-query. Another valid use would be to ignore empty fields as described above, although normally it is better to just let the default score of 0.0 be averaged into the output score.

 

LKT_QEXPR_AND: LPAR_DBLARR_REJECT_SCORES is an array of cutoff scores similar to LPAR_DBLARR_IGNORE_SCORES except that instead of ignoring the sub-query the record is rejected. (More precisely the output score for the AND expression is set to -1.0, which normally causes the record to be rejected, but could be trapped by a higher level AND expression and ignored as described above for LPAR_DBLARR_IGNORE_SCORES.) LPAR_DBLARR_REJECT_SCORES accepts the same score values with the same meaning as LPAR_DBLARR_IGNORE_SCORES.

If a sub-query is assigned both an ignore score and a reject score then the lesser value takes precedence over its range, the greater value is applied to the range from the lesser to the greater. E.g. if the reject score for sub-query 1 is 0.2 and the ignore score for sub-query 1 is 0.4 then records with a sub-query 1 score less than 0.2 are rejected and for records with a sub-query 1 score between 0.2 and 0.4 sub-query 1 is ignored. Conversely if the ignore score is 0.2 and the reject score 0.4 then scores less than 0.2 causes sub-query 1 to be ignored and scores from 0.2 to 0.4 causes the record to be rejected.

Warning: If using both ignore scores and reject scores on the same set of sub-queries it should be noted that ignore thresholds tend to favor records with high scores for one sub-query and very low scores for the others. But reject scores pass a record only if all sub-queries are above their given reject threshold. Thus combining ignore scores and reject scores can result in few or no records being returned.

 

LKT_QEXPR_RLINK: LPAR_STR_RLMODELNAME, which specifies the model name to be used.
LKT_QEXPR_RLINK: LPAR_BOOL_USEMODELTHRESH indicates whether the cutoff threshold in the TIBCO Patterns Learn model should be used. If this is set to true, and the named model contains a threshold value, the value is applied as an absolute cutoff score for cutoff processing. This supersedes any cutoff specified in the search options. If the model does not contain a threshold value, this option is ignored. The default value for this option is false. See Dynamic Score Cutoffs for details on cut off processing.
LKT_QEXPR_RECWGT: LPAR_STR_WGTFLD_NAME specifies the name of the field to be used to provide the weight value to be applied to the score or LPAR_INT_WGTFLD_NUM specifies the field using a field number instead of a name.
LKT_QEXPR_MATCH: The LPARs LPAR_DBL_MATCHSTRENGTH, LPAR_DBLARR_MATCHTHRESHOLDS, LPAR_DBLARR_MATCHREWARDS and LPAR_DBLARR_MATCHPENALTIES are accepted. For a description of these see MatchCase Score Combiner .
LKT_QEXPR_FIRSTVALID: LPARs: LPAR_DBLARR_CONFIDENCE_CUTOFFS, LPAR_DBLARR_MATCH_CUTOFFS and LPAR_BOOL_INVALID_ONLY are accepted. For a description of these see FirstValid Score Combiner.

Consider the previous example, but with the additional complication that the user now has five text boxes. The first and last names have been isolated into separate entries, but we still want to allow for queries or records where the first and last name have been accidentally transposed. In addition, there is now a gender box which is compared to the record using a predicate. The following code sets up the described query.

 

lpar_t        ANDquery,ANDqueryargs,simpquery,cogquery,sq,sf,ncw,predicate;
unsigned char *fname,*lname,*address,*ssn,*gender;
unsigned char *name_fields[] = { "First Name", "Last Name" };
unsigned char *ssn_fields[] = { "Social Security Number" };
unsigned char *address_fields[] = { "House Number", "Street Name", "City" };
unsigned char *gender_field = "Gender";
char *names[2];
int namelens[2];
double qweights[4] = { 1.0, 0.9, 0.75, 1.0 };
fname = get_fname_box();
lname = get_lname_box();
address = get_address_box();
ssn = get_ssn_box();
gender = get_gender_box();
ANDqueryargs=lpar_create_lst(LPAR_LST_QEXPR_ARGS);
/* Name cogquery */
cogquery =lpar_create_lst(LPAR_LST_COGQUERY);
names[0] = fname;
namelens[0] = strlen(fname);
names[1] = lname;
namelens[1] = strlen(lname);
sq=lpar_create_blkarr(LPAR_BLKARR_SEARCHQUERY,names,namelens);
lpar_append_lst(cogquery,sq);
sf=lpar_create_strarr(LPAR_STRARR_FIELDNAMES,name_fields,2);
lpar_append_lst(cogquery,sf);
ncw=lpar_create_int(LPAR_DBL_NONCOG_WGT,0.8);
lpar_append_lst(cogquery,ncw);
lpar_append_lst(ANDqueryargs,cogquery);
/* SSN simpquery */
simpquery=lpar_create_lst(LPAR_LST_SIMPLEQUERY);
sq=lpar_create_blk(LPAR_BLK_SEARCHQUERY,ssn,strlen(ssn));
lpar_append_lst(simpquery,sq);
sf=lpar_create_strarr(LPAR_STRARR_FIELDNAMES,ssn_fields,1);
lpar_append_lst(simpquery,sf);
lpar_append_lst(ANDqueryargs,simpquery);
/* Address simpquery */
simpquery=lpar_create_lst(LPAR_LST_SIMPLEQUERY);
sq=lpar_create_blk(LPAR_BLK_SEARCHQUERY,address,strlen(address));
lpar_append_lst(simpquery,sq);
sf=lpar_create_strarr(LPAR_STRARR_FIELDNAMES,address_fields,3);
lpar_append_lst(simpquery,sf);
lpar_append_lst(ANDqueryargs,simpquery);
/* Gender predicate */
predicate = lpar_create_predicate3(
lpar_create_str(LPAR_STR_PREDFIELD,gender_field)),
PRED_OP_EQUALS,
lpar_create_str(LPAR_STR_PREDVALUE,gender));
lpar_append_lst(ANDqueryargs,predicate);
ANDquery=lpar_create_lst(LPAR_LST_QEXPR);
lpar_append_lst(ANDquery,
lpar_create_int(LPAR_INT_QEXPR_TYPE,LKT_QEXPR_AND));
lpar_append_lst(ANDquery,
lpar_create_dblarr(LPAR_DBLARR_QUERYLETWEIGHTS,qweights,4));

 

First we create the list of queries to be combined. The first two querylets are part of a single cognate query, so the record score is calculated using a single bipartite graph using the two named record fields. The optional parameter, LPAR_DBL_NONCOG_WGT, sets up a penalty factor for non-cognate field matches as described in Cognate Query. The second and third queries are simple queries on the SSN and address fields respectively. Finally the predicate query tests the gender value.

The query combiner is then created and set as an AND combiner. We add the querylet weight array to the query combiner. Here we have decided to rank the name and gender as most important, followed by the SSN field with the address being least important. Finally we add the list of queries that are to be combined. This query is now ready to be used in a call to lkt_dbsearch.

Now suppose we have a music download site. Music is identified by song title and artist. For promotional reasons we wish to favor certain records over others. To do this we add a float field we'll call promotional ranking that contains a value between 0.5 and 1.5. We can set up a query that returns the matches adjusted by the promotional ranking as shown in the following code segment:

 

lpar_t ANDquery,ANDqueryargs,simpquery,sq,sf,opts;
lpar_t WGTquery,WGTqueryargs;
unsigned char *fname,*lname,*address,*ssn,*gender;
unsigned char *title_field[] = { "song title" } ;
unsigned char *artist_field[] = { "artist" };
unsigned char *ranking_field[] = { "promotional ranking" };
unsigned char *title ;
unsigned char *artist ;
title = get_title_box();
artist = get_artist_box();
ANDqueryargs=lpar_create_lst(LPAR_LST_QEXPR_ARGS);
/* Title simpquery */
simpquery=lpar_create_lst(LPAR_LST_SIMPLEQUERY);
sq=lpar_create_blk(LPAR_BLK_SEARCHQUERY,title,strlen(title));
lpar_append_lst(simpquery,sq);
sf=lpar_create_strarr(LPAR_STRARR_FIELDNAMES,title_field,1);
lpar_append_lst(simpquery,sf);
lpar_append_lst(ANDqueryargs,simpquery);
/* Artist simpquery */
simpquery=lpar_create_lst(LPAR_LST_SIMPLEQUERY);
sq=lpar_create_blk(LPAR_BLK_SEARCHQUERY,artist,strlen(artist));
lpar_append_lst(simpquery,sq);
sf=lpar_create_strarr(LPAR_STRARR_FIELDNAMES,artist_field,1);
lpar_append_lst(simpquery,sf);
lpar_append_lst(ANDqueryargs,simpquery);
/* Create And of artist and title simple queries */
ANDquery=lpar_create_lst(LPAR_LST_QEXPR);
lpar_append_lst(ANDquery,lpar_create_int(LPAR_INT_QEXPR_TYPE,LKT_QEXPR_AND));
lpar_append_lst(ANDquery,ANDqueryargs);
/* Now apply promotional rating to combined scores */
WGTquery=lpar_create_lst(LPAR_LST_QEXPR);
lpar_append_lst(WGTquery, lpar_create_int(LPAR_INT_QEXPR_TYPE,LKT_QEXPR_RECWGT));
opts = lpar_create_lst(LPAR_LST_QOPTS);
lpar_append_lst(opts, lpar_create_str(LPAR_STR_WGTFLD_NAME,ranking_field);
lpar_append_lst(WGTquery, opts);
WGTqueryargs = lpar_create_lst(LPAR_LST_QEXPR_ARGS);
lpar_append_lst(WGTqueryargs,ANDquery);
lpar_append_lst(WGTquery, WGTqueryargs);

 

Now WGTquery is ready to be passed to dvk_dbsearch. The raw score of each record is adjusted by multiplying the score by the value of the promotional ranking field to produce the final score used in selecting and ranking the records for return. The score returned is the score after adjustment. The original score is available in the LPAR_DBLARR_MATCHSCORE_QLT value.