Weighting Factors

There are several weights or weighting factors that can be applied. All of these weights belong to one of the following types:

Penalizing Weights: Weights that lower the final score. That is, the score is penalized by the given factor when the weight is applied.
Structural Weights: Weights that adjust the relative importance of different portions of a query, but do not impose a penalty.

The following example illustrates the difference between these two types of weights.

Run a query against the following record:

First

Last

Bob

Taf

Then, use both a simple query (penalizing weights) and a cognate query (structural weights).

The results for two different queries with combinations of weights is shown in the following table.

Query

Query Type

First Name Weight

Last Name Weight

Score

Bob Taf

Simple

1.0

1.0

1.0

Bob Taf

Simple

0.5

1.0

0.75

Bob Taf

Simple

1.0

0.5

0.75

Bob Taf

Cognate

1.0

1.0

1.0

Bob Taf

Cognate

0.5

1.0

1.0

Bob Taf

Cognate

1.0

0.5

1.0

Jim Taf

Simple

1.0

1.0

0.5

Jim Taf

Simple

0.5

1.0

0.5

Jim Taf

Simple

1.0

0.5

0.25

Jim Taf

Cognate

1.0

1.0

0.5

Jim Taf

Cognate

0.5

1.0

0.66667

Jim Taf

Cognate

1.0

0.5

0.33333

Note that in spite of an exact match, a penalty weight, as used by a simple query, lowers the score, but with a structural weight, as used by the cognate query, an exact match always has a 1.0 score.

In the second example, there is a complete mismatch between the first name field and a perfect match on the last name field.

With equal weights, both simple and cognate queries have a score of 0.5. With the penalty weights of the simple query, lowering the weight of the first name field does not affect the score, as it is already zero and zero times anything is still zero. But with the structural weights of the cognate query, the score is raised when the weight of the first name field is lowered. With a weight of 0.5 on the first name and 1.0 on the last name, the last name represents two thirds of the query, so the score is now two thirds of the perfect score of 1.0.

If you reverse the weighting, the perfect match on the last name is penalized by a factor of 0.5 in the simple query and the score on the first name is still zero, so the final score is reduced to 0.25. For the structural weights, the last name represents only one third of the query so the score is reduced to just one third of a perfect match.

Different Types of Weights

With a cognate query, you expect every field to match. The field set represents parts of a whole. By using weights, you adjust the relative importance of the parts, but do not penalize the total score for matching a particular part. Simple queries over multiple fields are often used where it is unclear which field is to be matched. The fields do not necessarily represent parts of a whole, they might represent alternatives, each field potentially being whole in itself. In this case, you might prefer matches in one field over another. Thus, you can penalize the less preferred fields. The same argument is valid for the AND vs. OR score combiners. The AND is part of a whole, so the weights are structural weights. The OR is an alternative, so penalty weights apply. Thesauri and weighted dictionaries are covered in Thesaurus and Term Weighting.

The following table provides the different weights and whether they are structural or penalizing.

Weight

Weight Type

Simple Query Field Weights

Penalizing

Cognate Query Field Weights

Structural

Cognate Query Non-Cognate Weight

Penalizing

Attributes Query Attribute Weights

Structural

AND Querylet Weight

Structural

OR Querylet Weight

Penalizing

Classic Thesaurus Weight

Penalizing

Weighted Term Semantic Weight

Structural

Combined Thesaurus Weight

Penalizing

Combined Thesaurus Semantic Term Weight

Structural