TIBCO Adapter products are based on the TIBCO messaging infrastructure, which supports data transmission in English and most European, Mid-Eastern and Asian languages. This section starts by describing basic encoding concepts, then describes how an adapter works in transmitting data in various languages.
When the ASCII character set was extended to support all western European languages, another bit was added creating 128 more spaces. This 8-bit or 1-byte character set is ISO-8859-1 (or Latin1). In ISO-8859-1, the most significant bit is set to represent the additional 128 slots, and most of them are used to encode Western European characters. When the most significant bit is not set, the eight bits still represent ASCII characters. Thus ASCII is a subset of this extended character sets.
Because one byte was not sufficient to include the Eastern European characters, a separate encoding or character set, called ISO-8859-2 (or Latin2) was created, which is not compatible with ISO-8859-1. Other ISO 8859 series character sets were also invented to handle a larger selection of languages. Again, all of these ISO-8859 character sets are superset of and backward compatible with ASCII.
However, the code spaces provided by 1 byte (or 8 bits) are not enough for representing the tens of thousands of characters used in languages such as Chinese, Japanese and Korean (CJK). Thus, multibyte character sets were developed, together with the corresponding encoding methods for each character sets. These include those locale-independent encodings such as ISO-2022, EUC, and locale-dependent encodings like Shift_JIS (Japanese), GB2312 (Simplified Chinese), Big5 (Traditional Chinese), and so on.
Unicode is a way to represent characters of all known languages of the world, which are defined in a character set named UCS (Universal Character Set). It provides a unique code point for every character regardless of platform, program or language.
The advantage of Unicode is that characters from the world's major scripts are uniformly supported. Thus applications running on different platforms with different locales can exchange information without misinterpretation, as long as they follow this uniform character set.
Locale is the information for a specific combination of language, territory (cultural data), and codeset. Examples are en_US.ASCII
, ja_JP.Shift_JIS
, etc. Normally, an operating system is started with a particular locale.
UCS is the abbreviation for Universal Character Set, which is specified by International Standard ISO/IEC 10646. UCS contains the characters of almost all the world's major scripts.
Unicode is a standard that defines a character code set (UCS, or Universal Character Set, defined in ISO/IEC 10646) that assigns a unique code point (scalar value) to each character of almost all the world's major languages. The Unicode standard also includes a series of character encodings that represent each of these code points, such as UTF-8, UTF-16 (UCS-2), UTF-32, etc. For more information about Unicode, refer to Unicode Consortium's official web site: www.unicode.org.
UTF-8 denotes Unicode Transformation Format 8 bit, a common Unicode encodings that serializes a Unicode scalar value as a sequence of one to four bytes. The purpose of the transformation encoding is that the data represented this way can be passed reliably through single byte environments.
UTF-8 is popular for data exchange between applications and protocols such as HTTP, XML, MIME, and SOAP. One nice feature of UTF-8 is that it is backward compatible with ASCII.
TIBCO Adapter products provide Unicode encoding support by taking advantage of UTF-8 as the TIBCO messaging encoding when exchanging data among TIBCO components (TIBCO applications and adapters). Obviously, this can apply only to text data.
In an adapter project where only ASCII or Latin-1 (ISO8859-1) data is exchanged between adapters and other TIBCO products, ISO8859-1 can be used as the TIBCO messaging encoding.
TIBCO Adapter products still use other native encodings (like ISO-8859 series, MS-Windows series, Shift_JIS, Big5, GB2312, etc.) to communicate with vendor applications (SAP, Oracle, Siebel, as examples). But the adapters internally use UCS-2 to represent text data. So the adapters are responsible for the encoding conversion between the vendor encoding and UCS-2, and between the TIBCO messaging encoding and UCS-2. This is shown in the next diagram where an example of a TIBCO Adapter for ActiveDatabase conversion between database character strings and Unicode strings is given.
Shift_JIS data retrieved from database by the adapter is converted to UTF-8, which is published using a TIBCO messaging transport. Another adapter, configured with a subscription service, converts the received message from UTF-8 into EUC-JP (another Japanese encoding) and inserts the data into the EUC-JP database to which it is connected.
By using UTF-8 as the TIBCO messaging encoding, the two adapters connecting to databases of different encoding can exchange data without data loss. Also, the systems where the two adapters are running need not be in the same locale. In the above scenario, as an example, the adapter publication service can run on a Windows platform with Shift_JIS as the locale encoding, while the adapter subscription service can run on a Solaris platform with EUC-JP as the locale encoding.
The encoding property is set on the project itself at design-time, and in the TIBCO administration server’s property file when creating a TIBCO Administration Domain.
The project setting is used at design-time when using the Adapter Tester or TIBCO BusinessWorks tester to verify an adapter instance or BusinessWorks process configuration. The project setting is also used when the project is exported as a local repository (in .dat
format).
The encoding value is set on the root project folder. By default, the value is set to ISO8859-1
. You can change the value by selecting the folder and under the Project Settings
tab, changing the value for the TIBCO Message Encoding
field.
The TIBCO administration server setting is used when the project is exported to a server repository or deployed using TIBCO Administrator Enterprise Edition.
For a server based project, the TIBCO messaging encoding is set by the repo.encoding
property in the server's tibcoadmin
<domain-name>
.tra
configuration file (located in <install-path>
/tibco/administrator/
n.n
/bin/
).
The encoding is set when using the TIBCO Domain Utility to create the domain or by editing the repo.encoding
property in the .tra
configuration file.
Each adapter or TIBCO application that uses the same server for storing and retrieving configuration data uses this encoding setting when communicating to each other. This assures that all TIBCO components (including adapters and other TIBCO applications) that belong to the same project use the same encoding value to communicate.
![]() |
For TIBCO Adapter release 4.x, the |
The following diagram shows a scenario where the TIBCO messaging encoding of two adapter services is controlled by the server's encoding property. The property is set to UTF-8.
The encoding property set in the project file is always superseded by the server's encoding property.
The encoding property discussed above is the encoding used by the TIBCO messaging between adapters and applications, not the encoding used for the persistent storage of the project files.
In a typical integration environment, all TIBCO ActiveEnterprise adapters and applications use the same project, connect to the same administration server, and use the server encoding property. Two encoding choices are available for this kind of project, ISO8859-1 and UTF-8.
Note that all the world's major language characters can be represented by UTF-8. Also, note that additional encodings can be used by an adapter to communicate with its vendor application or database. See Appendix B, Encoding Tables for a list of additional encodings supported by most adapters.
TIBCO Adapter™ Concepts April 2005 Copyright © TIBCO Software Inc. All rights reserved www.tibco.com |