Creating a Thesaurus (lkt_create_thesaurus)

The following command is used to create a thesaurus either locally or on a remote server.

dvkerr_t lkt_create_thesaurus(lpar_t host, lpar_t thesaurus_options, 
lpar_t thesaurus_data );
dvkerr_t lkt_create_thesaurusT(lpar_t host, lpar_t thesaurus_options,
lpar_t tran,
lpar_t thesaurus_data );

Input

host (optional)

This is used to identify the server. For more information, see Communicating with TIBCO Patterns Servers

thesaurus_options (required)

is a list containing thesaurus settings. It must contain one thesaurus name lpar (LPAR_STR_THESAURUSNAME). Like the database name, this is a null-terminated character string that is unique to the thesaurus.

The list might also contain the following:

LPAR_STR_THESAURUSTYPE specifies the type of thesaurus that is being created. Set this parameter to "weighted term" for a weighted term thesaurus, "substitution" for a standard thesaurus or "combined" for a combined thesaurus. The default value is "substitution".
LPAR_STR_CHARMAP The character map used in loading the thesaurus class items. Character maps are described in Character Mapping.
LPAR_INT_THESMATCHMODE Indicate whether error tolerant (0) or exact only (1) matching is to be used by this thesaurus. Default value is (0), error tolerant.

If the thesaurus name is the only desired parameter, it might be passed directly without being encapsulated in a list.

thesaurus_data (required)

is a list elements of type LPAR_BLKARR_THESAURUSCLASS. A standard substitution thesaurus class is an array of byte blocks which form an equivalence class (each element in the array is equivalent to each other element in the array). The elements of the block array must be UTF-8 encoded characters.

If this thesaurus is a weighted term thesaurus, the first item of the LPAR_BLKARR_THESAURUSCLASS array must be a floating point number in string format (as recognized by the C sscanf(3) function). This is the weight for the terms of the class. If the first item is not a valid floating point value the load of the thesaurus fails and a file format error is returned. All following elements are the terms. As in the standard substitution thesaurus these terms are considered equivalent.

If this thesaurus is a combined thesaurus the first item of the LPAR_BLKARR_THESAURUSCLASS array is the term weight as described for weighted term thesauri. The second term is the substitution penalty. It has the same format as the weighted term with the restriction that it must be a value between 0.0 and 1.0 inclusive.

Multi-token thesaurus matching is allowed. This means you can equate hypertension with high blood pressure. Empty terms are not allowed and causes the create to fail.

A thesaurus can also be read directly from a file by passing as the contents of this list a single LPAR_STR_CSVFILE and optionally an LPAR_STR_ENCODING value instead of the LPAR_BLKARR_THESAURUSCLASS values. If a host is specified, the file must exist within the host's loadable-data directory. In this case, each line in the file is considered an equivalence class and must follow the rules for a class for the thesaurus type. The character encoding used in the file can be specified with a LPAR_STR_ENCODING parameter. The recognized encodings are: UTF-8 and latin-1 case-sensitive. The default is latin-1. If only an LPAR_STR_CSVFILE value is desired, it can be given directly as the thesaurus_data value, instead of packaging it in a list. If the same token occurs in multiple equivalence classes, the tokens in a given class are not equivalent with the tokens of the other classes. For example, if given the two equivalence classes (X,Y) and (X,Z), X would be a synonym for Y and Z, but Y and Z would not be synonyms for each other.

tran (optional)

identifies the user transaction (LPAR_LONG_TRAN_ID) under which the thesaurus is to be created.

If a thesaurus with the specified thesaurus name already exists, it is replaced.

Warning: Thesaurus Character Maps: An Explanation And Caution
Warning: The thesaurus character map is used when loading the thesaurus class entries. The character map is applied to all of the class entries before being stored.
Warning: Although there is no validation that forces you to do so the thesaurus character map should always match the map used in the database fields to which the thesaurus is applied.
Warning: If the thesaurus character map differs from the character map for the field data the thesaurus might be unable to find any matches. The default map for both the thesaurus and the field data is the same. In the vast majority of cases this is the mapping that should be used in both cases.

 

Error codes and items returned by lkt_create_thesaurus

NOCHARMAP

list containing character map that doesn't exist

EXPECTTHESNAME

item that was expected to be a thesaurus name

EXPECTLIST

item that was expected to be a list

FEATURESET

(none)

FILEFORMAT

LPAR_STR_FILEFORMATERROR description of error

INTERNAL

LPAR_STR_ERRORDETAILS description of error

IOERROR

LPAR_STR_SYSERROR description of error

PARAMCONFLICT

item in conflict with earlier item

PARAMMISSING

(none)

PARAMVAL

item that had an invalid value

TRAN_UNKNOWN

lpar that contains the unknown transaction id

TRAN_IN_USE

lpar that contains the transaction id

TRANCONFLICT

list that contains LPAR_LONGINT_TRAN_ID and LPAR_STR_ERRORDETAILS