Complex Queries

The following ways of comparing queries or other “given” data (such as constant values in predicate expressions) with attributes of records in a table were described:

•

Simple queries compare an unstructured text string against one or more fields.

•

Cognate queries compare a set of querylets (query segments) against a corresponding set of closely related fields subject to frequent misfielding.

•

Variable Attribute queries match a list of Variable Attribute values.

•

Date comparisons are optimized for inexact comparison of date values.

•

Predicate queries evaluate a predicate expression over one or more attributes of a record.

Each of these five comparison methods can be thought of as a score generator. Each of them outputs a value between 0.0 and 1.0 indicating a degree of similarity between the query and one or more record attributes (or, in the case of most predicate queries, whether or not the record satisfies the predicate).

In a complex query, the scores output from two or more score generators are combined into a single overall score using one or more score combiners. The two most commonly used score combiners are the logical query operators AND and OR, which are illustrated with examples in Scenario 5: Alternate Conditions.

The AND operator combines the scores output by a set of score generators into a single composite score by computing the weighted average of those scores. By default, each input score receives a weight of 1.0, and the output score of the AND operator is just the simple arithmetic mean of the input scores. By decreasing some of these weights to positive values less than 1.0, you can decrease the relevancy of some of the input scores relative to the others.

The OR operator combines the scores output by a set of score generators into a single composite score by computing their maximum.

As described in Introduction, the AND and OR logical query operators, because they work as score combiners, “soften” the characteristics of Boolean connectives of conventional query languages. The TIBCO Patterns - Search AND operator outputs a measure of total amount of similarity, based on weighted contributions from different comparisons. Analogously, the TIBCO Patterns - Search OR operator selects the greatest amount of similarity from a set of comparisons that might all be “inexact.”

Score “trees”

The output of a score combiner can be one of the inputs to another score combiner. This allows you to combine a set of scores in one way, say with an OR operator, and then combine the resulting score with other scores using a different score combiner, say an AND operator. For example, using the OR operator to select the best score from the comparison of a name with several alternate name or alias fields, and then feed that score to an AND operator along with the output scores from other simple or cognate queries.

A complex query consists of a tree-like set of relationships, in which individual scores output by simple, cognate, Variable Attribute, date, or predicate queries represent the "leaves" of the tree. These are progressively combined using score combiners, until an overall score is the final output for the record (this represents the “root” of the tree).

In the following example, an overall query consisting of three name querylets, a street address querylet, a querylet combining city or ZIP code information, a phone number querylet, and a date-of-birth querylet is defined. The table to be searched has fields that include three name fields, two street address fields, a city field, a ZIP code field, three different phone number fields, and a date-of-birth field.

The figure illustrates one possible design for a complex query that combines seven separate query comparisons.

•

There is a cognate query for comparing the three name querylets against the three name fields. Check how the field weights and the non-cognate weight are set.

•

Use a simple query to compare the address querylet against the two lines of the street address.

•

Use another simple query to compare our city or ZIP querylet against those two fields in the record. (Note that the field weights for either of these two simple queries are not set in the example).

•

To handle the phone number, three simple queries to match a single phone number querylet against each of the three phone number fields are used.

•

For the date-of-birth the special comparison method for dates is used.

•

Finally, the five resulting scores are combined using an AND operator with the specified set of weights to obtain the overall score for the record.

Here are some additional considerations related to complex queries:

•

In general, two kinds of structured queries are present that suggest the construction of a complex query:

—

Structured queries over diverse attribute types, especially when these also have diverse relevance for the match. In this case, separate simple queries might be constructed for single field (and cognate queries for groups of closely related fields subject to frequent misfielding), and the resulting scores combined using the AND combiner along with a set of weights to use in computing the weighted average.

—

A querylet that requires comparison against each of several alternative fields where each field represents a possible alternate form of the entire query as opposed to a different portion of the query. For example, a record might contain multiple fields for a phone number: home, work, cell, second line, a query might contain a single phone number querylet that should be matched against any of the phone number fields in the record. In this case, separate simple or cognate queries might be constructed for each alternative, and then the highest score can be selected using the OR combiner.

•

In some complex-query situations, you might have a large number of query inputs (each translating into a querylet), many of which are optional. (A common case might be implementing a search service.) If a record does not have a value for one of these optional fields, or a value that matches very poorly, sometimes you do not want to penalize the record for failing to match that optional value. In cases such as these, you can direct TIBCO Patterns - Search to ignore the querylet’s contribution to the score for that record. See the TIBCO Patterns - Search Programmer’s Guide for more details and appropriate cautions relating to this “ignore scores” feature.

•

In some complex-query situations, a large number of query inputs are available (each translating into a querylet), of which some are mandatory. (In other words, you do not want any records that do not have very high scores in that field.) In these situations, direct TIBCO Patterns - Search to reject records that have scores less than a threshold value for such a mandatory querylet. See the TIBCO Patterns - Search Programmer’s Guide for more information and appropriate cautions relating to this “reject scores” feature.

•

All queries must contain at least one Simple, Cognate, Variable Attributes query, or a Date query against a searchable date field; you cannot have a query that consists of only predicate and date queries against non-searchable date fields, and score combiners.