Match Visualization
Many search technologies allow end users to see which portions of a returned record matched the query. The matching algorithm used in TIBCO Patterns permits users to see, not only which portions of returned records matched the query, but the degree to which each matched portion of a record contributed to the overall measure of similarity between the record and the query. For every character in a returned record, TIBCO Patterns returns an “intensity” value that can be used to differentially highlight portions of a record according to their match “intensity”. This differential highlighting is called match visualization. Typically, different font colors or type sizes are used to emphasize characters according to their associated intensity values.
If requested the TIBCO Patterns server returns four values, known as V, P, D, and N, for each character in the searchable text fields of every record returned.
The V value provides all the visualization information needed for the vast majority of applications for which match visualization is required. It is a “composite” or “summary” intensity value reflecting the various dimensions of the matching algorithms. The P, D, and N values represent these dimensions as separate values. The P value indicates the length of the segment of matching text to which this character position belongs (zero for unmatched characters). The D value measures how far away this matching segment is from the corresponding segment in the query, given the optimal alignment of the query over the searchable record text computed by the matching algorithm. The N value for a matched character is set to one if that matched character was deemed “noise” by the matching algorithm, and hence contributed much less to the overall similarity score.
The most typical color visualization scheme associates six shades of a particular hue with different ranges of the V value, with brighter shades corresponding to higher V values (greater match intensities). The V values are split evenly into 7 score ranges. The lowest range is typically not highlighted (these matching characters are generally “noise”), and the successively higher ranges are assigned the six shades of increasing brightness.
To make this kind of color visualization easier, TIBCO Patterns provides an interface for specifying a base color (usually selected to suit the color palette of a web page), and a matching color (representing the strongest match intensity). TIBCO Patterns then computes the color gradient and returns result lists already formatted with the appropriate HTML tags for color visualization.
For more details and other visualization options, see the TIBCO® Patterns Programmer’s Guide.