Weighting Factors
There are several weights or weighting factors that can be applied. All of these weights belong to one of the following types:
| • | Penalizing Weights: Weights that lower the final score. That is, the score is penalized by the given factor when the weight is applied. |
| • | Structural Weights: Weights that adjust the relative importance of different portions of a query, but do not impose a penalty. |
The following example illustrates the difference between these two types of weights.
Run a query against the following record:
|
First |
Last |
|
Bob |
Taf |
Then, use both a simple query (penalizing weights) and a cognate query (structural weights).
The results for two different queries with combinations of weights is shown in the following table.
|
Query |
Query Type |
First Name Weight |
Last Name Weight |
Score |
|
Bob Taf |
Simple |
1.0 |
1.0 |
1.0 |
|
Bob Taf |
Simple |
0.5 |
1.0 |
0.75 |
|
Bob Taf |
Simple |
1.0 |
0.5 |
0.75 |
|
Bob Taf |
Cognate |
1.0 |
1.0 |
1.0 |
|
Bob Taf |
Cognate |
0.5 |
1.0 |
1.0 |
|
Bob Taf |
Cognate |
1.0 |
0.5 |
1.0 |
|
Jim Taf |
Simple |
1.0 |
1.0 |
0.5 |
|
Jim Taf |
Simple |
0.5 |
1.0 |
0.5 |
|
Jim Taf |
Simple |
1.0 |
0.5 |
0.25 |
|
Jim Taf |
Cognate |
1.0 |
1.0 |
0.5 |
|
Jim Taf |
Cognate |
0.5 |
1.0 |
0.66667 |
|
Jim Taf |
Cognate |
1.0 |
0.5 |
0.33333 |
Note that in spite of an exact match, a penalty weight, as used by a simple query, lowers the score, but with a structural weight, as used by the cognate query, an exact match always has a 1.0 score.
In the second example, there is a complete mismatch between the first name field and a perfect match on the last name field.
With equal weights, both simple and cognate queries have a score of 0.5. With the penalty weights of the simple query, lowering the weight of the first name field does not affect the score, as it is already zero and zero times anything is still zero. But with the structural weights of the cognate query, the score is raised when the weight of the first name field is lowered. With a weight of 0.5 on the first name and 1.0 on the last name, the last name represents two thirds of the query, so the score is now two thirds of the perfect score of 1.0.
If you reverse the weighting, the perfect match on the last name is penalized by a factor of 0.5 in the simple query and the score on the first name is still zero, so the final score is reduced to 0.25. For the structural weights, the last name represents only one third of the query so the score is reduced to just one third of a perfect match.
Different Types of Weights
With a cognate query, you expect every field to match. The field set represents parts of a whole. By using weights, you adjust the relative importance of the parts, but do not penalize the total score for matching a particular part. Simple queries over multiple fields are often used where it is unclear which field is to be matched. The fields do not necessarily represent parts of a whole, they might represent alternatives, each field potentially being whole in itself. In this case, you might prefer matches in one field over another. Thus, you can penalize the less preferred fields. The same argument is valid for the AND vs. OR score combiners. The AND is part of a whole, so the weights are structural weights. The OR is an alternative, so penalty weights apply. Thesauri and weighted dictionaries are covered in Thesaurus and Term Weighting.
The following table provides the different weights and whether they are structural or penalizing.
|
Weight |
Weight Type |
|
Simple Query Field Weights |
Penalizing |
|
Cognate Query Field Weights |
Structural |
|
Cognate Query Non-Cognate Weight |
Penalizing |
|
Attributes Query Attribute Weights |
Structural |
|
AND Querylet Weight |
Structural |
|
OR Querylet Weight |
Penalizing |
|
Classic Thesaurus Weight |
Penalizing |
|
Weighted Term Semantic Weight |
Structural |
|
Combined Thesaurus Weight |
Penalizing |
|
Combined Thesaurus Semantic Term Weight |
Structural |