Match Visualization

In most cases, users of TIBCO Patterns servers profit greatly from the ability to see which portions of a matching record matched the query, and with what intensity. We call this match visualization. Typically, different colors and/or type sizes are used to highlight characters of matching records according to their relative match intensity.

When you include the LPAR_INT_VISUALSTYLE parameter set to a value of 2 in the srchpars parameter list that you pass to the lkt_dbsearch command, lkt_dbsearch will output visualization information. You'll find it, along with other match-specific information, in each of the match information lists contained in the minfo output list (see Invoking TIBCO Patterns Matching (lkt_dbsearch)).

Each minfo list contains four array lpars, LPAR_DBLARR_V2V, LPAR_INTARR_V2P, LPAR_INTARR_N, and LPAR_INTARR_V2D (known respectively as the V array, the P array, the N array, and the D array), whose lengths are equal to the length of all searchable text fields of the matching record.

Note: In the case of queries subject to field selection, this length includes the lengths of all searchable text fields, those selected for searching and those not selected. Non-searchable text fields, variable attributes fields, integer, double and date fields are NOT included.

 

The first element of each array applies to the first character of text, the second element of the arrays applies to the second character of text, and so on. In other words, there is a V value, P value, N value, and a D value for each character of searchable text in a record.

It is important to note that the visualization vectors are based on character positions, not byte positions. If the original data was UTF-8 encoded the lengths of the visualization vectors corresponds to the number of characters in the original data, which might differ from the number of bytes. In all of the discussion below references to character position mean exactly that, character and not byte.

Each (V, P,N,D) value set provides a measure of the match intensity at that character position in the searchable text. The P value measures how big a segment of matching text this character position is part of. The D value measures how far away this segment is from the corresponding segment in the query, given an optimal superposition of query over searchable text. If the P value for a character position is 0, this means that character remained unmatched in the match calculation. The N value is set to one if the character is considered to be noise by the matching algorithm. Noise characters are considered much less important than non-noise characters. They contribute much less to the match score than non-noise characters. The V value is a summary score for the match strength on that character computed from the P, D and N values plus various weighting factors defined for the query. It is a value between 0.0 and 1.0 inclusive. For most applications the V value should provide all of the visualization information needed.

The typical color visualization scheme utilizing these values is one that associates six shades of some hue with V values, with brighter shades corresponding to higher V values. The V values are split evenly into 7 score ranges. The lowest range is not highlighted, the higher ranges are assigned the six shades with the higher score ranges getting the brighter shades.

When you include the LPAR_INT_VISUALSTYLE parameter set to a value of 256, lkt_dbsearch outputs a block in HTML format based on the V array. The blocks are returned in the minfo lpar list with each match, and are of type LPAR_BLK_HTML. This style accepts several lpars via the srchpars list for customization:

LPAR_BOOL_USECOLOR specifies whether or not colors are used to distinguish match strength. If true, it creates a gradient between LPAR_STR_BASECOLOR and LPAR_STR_MATCHCOLOR for different strengths of matches within the searchable text. This parameter must be true for LPAR_STR_BASECOLOR and LPAR_STR_MATCHCOLOR to be valid parameters.

Default value: true

LPAR_STR_BASECOLOR specifies the end of the color gradient for weak matches.
Note: The base color does not specify unmatched text color; it specifies a gradient so that matches fit well with the default text color of a webpage.

It must be a string of length six containing six hexadecimal digits corresponding to an RGB color. The first two digits correspond to the amount of red, the third and fourth to the amount of green, and the fifth and sixth to the amount of blue. Note that this is one of the standard formats for specifying colors in HTML.

Default value: "0000ff"(blue)

LPAR_STR_MATCHCOLOR specifies the end of the color gradient for strong matches.

The string's format is the same as that of LPAR_STR_BASECOLOR.

Default value: "ff0000" (red)

LPAR_BOOL_USEBGCOLOR specifies whether or not background colors (text highlighting) are used to distinguish match strength. This functions like LPAR_BOOL_USECOLOR except it uses LPAR_STR_BASEBGCOLOR and LPAR_STR_MATCHBGCOLOR to create the gradient.

Default value: false

LPAR_STR_BASEBGCOLOR specifies the end of the background color gradient for weak matches. The string's format is the same as that of LPAR_STR_BASECOLOR.

Default value: "0000ff" (blue)

LPAR_STR_MATCHBGCOLOR specifies the end of the background color gradient for strong matches. The string's format is the same as that of LPAR_STR_BASECOLOR.

Default value: "ff0000" (red)

LPAR_INT_MAXFONTSIZE creates a gradient between this value and the html document's default font size. It must be between 0 and 4, where values of 1 to 4 increase the font size for stronger matches and 0 disables the feature.

Default value: 0 (disabled)

LPAR_INT_BOLDTHRESH is the threshold for displaying matching text in bold. It accepts a value between 0 and 6. Characters in the searchable text are bold if the corresponding V is greater than or equal to this threshold. A threshold value of 0 disables this feature. The integer values 1 through 6 correspond to the 6 upper ranges of the evenly divided V value range of 0.0 - 1.0.

Default value: 0 (disabled)

LPAR_INT_ITALICSTHRESH is identical to LPAR_INT_BOLDTHRESH except that it specifies the threshold for italicized text.

Default value: 0 (disabled)

LPAR_INT_UNDERLINETHRESH is identical to LPAR_INT_BOLDTHRESH except that it specifies the threshold for underlined text.

Default value: 0 (disabled)

Visualization style 256 also returns LPAR_BLK_HTMLLEGEND for each search via the stats lpar list. It is an HTML segment that contains a strong to weak indicator to help interpret the match strengths in the LPAR_BLK_HTML lpars.