Copyright © Cloud Software Group, Inc. All Rights Reserved
Copyright © Cloud Software Group, Inc. All Rights Reserved


Chapter 2 Configuring the Data Object Broker : Customize the TIBCO Object Service Broker Unicode Processing

Customize the TIBCO Object Service Broker Unicode Processing
Overview
You can configure the conversion, collation, and case processing of Unicode data in TIBCO Object Service Broker. TIBCO Object Service Broker comes with a set of source configuration files and you can:
Specify conversions between Unicode and External User Syntaxes using files from http://dev.icu-project.org/cgi-bin/viewcvs.cgi/charset/data/ucm/
These source files may be used to generate binary configuration files which, if present, are read at initialization time replacing the default configuration data. The defaults in the system correspond to the IBM-037 code page. There are no External User Syntaxes defined by default.
There are five types of configuration data, used for:
Format of the Data Files
Each of the first 4 source configuration file types consists of lines no longer than 80 characters:
Data lines can include comments, which follow an asterisk, after the required fields. The formats of data lines for the four types of files are shown below. The names in parentheses are the names of the files to be used to configure the system.
The fifth source configuration file type is a ucm (UniCode Mapping) file which specifies a mapping between Unicode and a user-defined external syntax. You can have up to 16 files of this type to map up to 16 different external user syntaxes.
Unicode to EBCDIC Mapping (UniToEbc)
Data mapping lines contain two significant fields, separated by white space:
For example, here is a portion of a file:
* TIBCO Object Service Broker Unicode to EBCDIC conversion file
* Based on EBCDIC code page IBM-037.
0030 F0 *The character '0'
0031 F1 *The character '1'
A Unicode character can be mapped only once. You can map more than one Unicode character to the same EBCDIC character.
EBCDIC to Unicode Mapping (EbcToUni)
Data mapping lines contain two significant fields, separated by white space:
For example, here is a portion of a file:
* TIBCO Object Service Broker EBCDIC to Unicode conversion file
* Based on EBCDIC code page IBM-037.
F0 0030 *The character '0'
F1 0031 *The character '1'
An EBCDIC character can be mapped only once.
Unicode Case Mapping (UniCase)
Case mapping lines contain three significant fields, separated by white space:
For example, here is a portion of a file:
* TIBCO Object Service Broker Unicode Case Mapping File
* Based on Unicode locale en_US.
0041 U 0061 * A
FF22 U FF42 * B
Unicode Collation (UniColl)
Data lines contain a single significant field: a hex value (0000 to FFFF) representing a Unicode code point. The data lines list the code points in order of their collation. The file must contain 65,536 unique data lines to specify all possible code points.
Unicode to/from External User Syntax Mapping (UniXC01-UniXC16)
The format is described at http://icu.sourceforge.net/userguide/conversion-data.html. Data lines contain three significant fields, separated by white space:
Sample Unicode Configuration Files Provided
This is a list of the names of the sample configuration files shipped with TIBCO Object Service Broker (on Solaris, the names are case sensitive). These files appear in your %HURON%/UnicodeConfig directory. The 3- or 4-digit numbers in the filenames refer to the IBM-xxx EBCDIC code page they are based on. Use the files as they are, or modify copies of them to create the desired configuration specification.
Unicode to EBCDIC Mapping
Unicode Collation
Creating Binary Unicode Configuration Files
The unigen utility program is used to convert the source configuration files into binary files which will be read at initialization time. You need to run unigen to create binary files for the first 4 file types if the defaults are not suitable for you. You also need to run unigen for type 5 if you wish to define any external user syntaxes. All binary files generated by unigen should be written to the directory specified by the UNICODEDIR parameter or environment variable (see next section). Usage for the unigen executable is as follows:
unigen n source target format syntax codepage fallback
where:
 
Specifying Unicode Configuration
You use the UNICODEDIR Data Object Broker parameter, the UNICODEDIR Execution Environment parameter, and the UNICODEDIR environment variable (for the offline batch utilities) to specify the directory where the Unicode configuration files reside, for example, %HURON%/database/UNICODEDIR.
Specific filenames are used for the configuration files.
If UNICODEDIR is not specified, no configuration files are read and the default configurations, which are part of the TIBCO Object Service Broker application, are used and no external user syntaxes are defined. If UNICODEDIR is specified, the TIBCO Object Service Broker initialization code looks for each of the first four files in the specified directory. If the file is present, TIBCO Object Service Broker uses it to configure its Unicode processing. Otherwise, it uses the default configuration for that file. All files with names matching UniXCnn (where nn may be from 01 to 16) are read and are used to configure up to 16 external user syntaxes XC01 to XC16.
For example, to override only the Unicode to EBCDIC mapping file, do the following:
1.
For correct performance, choose the uexxx file, as supplied with TIBCO Object Service Broker, that corresponds to the NLS code page of your system.
2.
3.
Run unigen 1 source target 2 where source and target are the path names of the source and target files.
4.
Copy the target file to the Unicode configuration directory specified in your UNICODEDIR Execution Environment and Data Object Broker parameters.
5.
To define external user syntaxes XC01 through XC03, do the following:
1.
Select the .ucm files corresponding to the code pages you want to define as your external user syntaxes. Assume that you want to use code page IBM-939 for syntax XC01, IBM-939 with fallback codes for syntax XC02, and IBM-933 for syntax XC03. Also assume that directory D:\sourceConfig contains the files ibm939.ucm and ibm933.ucm and that the desired output directory is C:\Unicode.
2.

 
unigen 5 D:\sourceConfig\ibm939.ucm C:\Unicode\UniXC01 2 1 IBM-939
unigen 5 D:\sourceConfig\ibm939.ucm C:\Unicode\UniXC02 2 2 IBM-939 true
unigen 5 D:\sourceConfig\ibm933.ucm C:\Unicode\UniXC03 2 3 IBM-933

 

Copyright © Cloud Software Group, Inc. All Rights Reserved
Copyright © Cloud Software Group, Inc. All Rights Reserved