Additional Considerations Related to Thesaurus Tables
Here are some additional considerations related to thesaurus tables (all variants):
| • | TIBCO Patterns lets you associate a specific thesaurus table with each simple, cognate, or Variable Attributes query. Thus, advanced queries that combine the results of several simple or cognate queries can involve several different thesaurus tables. |
| • | By default, TIBCO Patterns thesaurus tables incorporate a limited degree of error-tolerance in the detection of thesaurus terms. A simple misspelled term in either the query or the record (or even in both) generally does not interfere with its detection as a term occurring in a thesaurus table. Exact-only detection of thesaurus terms is also provided as an option. |
| • | In either the weighted dictionary or the combined thesaurus table, the semantic term weight for a class might be given the special value –1.0, indicating a stop token. A term identified as a stop token is ignored entirely – it is not matched, nor is the fact of its not matched influence the score in any way. Note that this is different than a term weight of 0.0. With a zero weight, a term is still allowed to participate in matching and thus might influence the score. With a term weight of -1.0, the term is effectively removed from the query and record before matching is performed. |
| • | A particular word or phrase in a query or record might have multiple substitution possibilities. Although the rules for resolving such ambiguities are complex, in general, they are resolved in such a way as to maximize the overall score of the query and record combination. |
| • | Thesaurus data (equivalence classes and associated weight values) can be constructed programmatically at the API level, or read from the files in CSV format. The data can then be loaded into the TIBCO Patterns server as a resident in-memory thesaurus available for all searches. |
| • | Alternatively, a thesaurus might be defined at query time and supplied (along with its content) as one of the parameters of the search. A thesaurus table so defined exists only for the duration of the query, and is therefore known as an ephemeral thesaurus. An ephemeral thesaurus is appropriate in cases where possible synonyms or term weights have to be generated dynamically based on the content of the query, and cannot be predefined for all possible queries. Although there is no size limit for an ephemeral thesaurus, it is strongly recommended to limit them to a few classes in size. Creating an ephemeral thesaurus by reading from a CSV file is likewise possible, but not recommended, for obvious performance reasons. |