Weighting Fields
Field weights allow you to alter the importance of fields in a record or querylet.
Field weights can be attached to an LPAR_LST_SIMPLEQUERY, LPAR_LST_COGQUERY, LPAR_LST_ATTRQUERY or LPAR_LST_QUERYLET inside an old style multiquery. These field weights, contained in an LPAR_DBLARR_QFIELDWEIGHTS, are double precision floating point numbers in the range of 0.0 (no weight) to 1.0 (maximum weight). Each element of this array corresponds one to one to the selected fields for that individual querylet. Weights for non-participating fields should not be included.
Querylet specific field weights allow you to adjust the importance of fields on a querylet by querylet basis. For old style multi-queries this was generally used to set the cognate vs. non-cognate field weighting (see Cognate Query and Old-style Query Construction for more details on this). For new style simple and cognate queries this provides a more flexible means of setting the importance of fields if you should need it.
Field weights behave somewhat differently depending on which query structure they are applied to. For a simple query field weights are penalizing, that is they reduce the overall score of the match. For cognate and attribute queries field weights are structural, that is they change the relative importance of matches within the fields, but do not actually penalize the final score. For a simple query if a query value matches in a field with a weight of 0.8, then the highest score that match can have is 0.8, even if it is a perfect match. With a cognate query or an attributes query a perfect match in a field with a weight of 0.8 still receives a final score of 1.0. The following example might help in understanding what the different kinds of weights do.
We run a query against the record:
|
first |
last |
|
Bob |
Taf |
We use both a simple query (penalizing weights) and a cognate query (structural weights). The results for two different queries with various combinations of weights are:
|
Query |
Query Type |
Weights (first,last) |
Score |
|
Bob Taf |
Simple |
1.0, 1.0 |
1.0 |
|
Bob Taf |
Simple |
0.5, 1.0 |
0.75 |
|
Bob Taf |
Simple |
1.0, 0.5 |
0.75 |
|
Bob Taf |
Cognate |
1.0, 1.0 |
1.0 |
|
Bob Taf |
Cognate |
0.5, 1.0 |
1.0 |
|
Bob Taf |
Cognate |
1.0, 0.5 |
1.0 |
|
Jim Taf |
Simple |
1.0, 1.0 |
0.5 |
|
Jim Taf |
Simple |
0.5, 1.0 |
0.5 |
|
Jim Taf |
Simple |
1.0, 0.5 |
0.25 |
|
Jim Taf |
Cognate |
1.0, 1.0 |
0.5 |
|
Jim Taf |
Cognate |
0.5, 1.0 |
0.66667 |
|
Jim Taf |
Cognate |
1.0, 0.5 |
0.33333 |
The first thing to note is that even with an exact match penalty weight as used by a simple query lowers the score, but with structural weights as used by the cognate query an exact match always has a 1.0 score.
In the second example, we have a complete mismatch on the first name field and a perfect match on the last name field. With equal weights both simple and cognate queries have a score of 0.5.
With the penalty weights of the simple query lowering the weight of the first name field has no effect on the score as it is already zero. But with the structural weights of the cognate query we see that the score is raised when the weight of the first name field is lowered. With a weight of 0.5 on the first name and 1.0 on the last name the last name now represents two thirds of the query, so the score is now two thirds of the perfect score of 1.0.
If we reverse the weighting, in the simple query the perfect match on last name is now getting penalized by a factor of 0.5, and the score on the first name is still zero, so the final score is reduced to 0.25. For the structural weights the last name now represents only one third of the query so the score is reduced to just one third of a perfect match.
Why do we have different types of weights? With a cognate query we fully expect that every field is matched. The field set represents parts of a whole. By using weights we adjust the relative importance of the parts, but we do not want to penalize a score for matching a part. Simple queries over multiple fields are often used where it is unclear which field is to be matched. The fields do not necessarily represent parts of a whole, they might represent alternatives, each field potentially being a whole in itself. In this case we might prefer matches in one field over another, thus we penalize the less preferred fields. The same arguments hold for the AND vs. the OR score combiners. The AND is parts of a whole, so the weights are structural weights, the OR is alternatives, so penalty weights apply.
Below is a table of the different weights and whether they are structural or penalizing.
|
Weight |
Type |
|
LPAR_DBLARR_QUERYLETWEIGHTS (AND) |
Structural |
|
LPAR_DBLARR_QUERYLETWEIGHTS (OR) |
Penalizing |
|
LPAR_DBL_THESAURUSWEIGHT |
Penalizing |
|
LPAR_DBLARR_QFIELDWEIGHTS (Simple Query) |
Penalizing |
|
LPAR_DBLARR_QFIELDWEIGHTS (Cognate Query) |
Structural |
|
LPAR_DBLARR_QFIELDWEIGHTS (Attributes Query) |
Structural |
|
LPAR_DBL_NONCOG_WGT |
Penalizing |