Copyright © TIBCO Software Inc. All Rights Reserved
Copyright © TIBCO Software Inc. All Rights Reserved


Chapter 8 Advanced Features : Adapter SDK Unicode Support

Adapter SDK Unicode Support
The C++ API and the Java API both support Unicode for application data. This allows custom adapters to work with Unicode strings programmatically and to send and receive data between applications that use a variety of supported encodings. The data can be aggregated and serialized and then be sent over the network.
The Java Adapter SDK takes advantage of native Java Unicode support.
The C++ Adapter SDK includes the MChar and MStringData classes to encapsulate Unicode data. Their constructors allow specifying the encoding for the source data. A complete list of supported encodings can be found in MEncoding.h. You can also create an MWString instance from either class, which allows you to call string manipulation methods against your data.
The following classes are available in the C++ Adapter SDK:
MString and MWString are used when you need string manipulation methods to operate on the data.
MString can encapsulate single-byte character data, while MWString encapsulates Unicode (UTF-16) characters.
You must convert MString and MWString to MChar and MStringData before sending them on the network.
MStringData and MChar are used to encapsulate Unicode data.
Any source data strings are converted to Unicode by the Adapter SDK upon construction, as long as the source is in an encoding supported by the Adapter SDK and the encoding is provided to the constructor.
Prespecifying Encoding
Custom adapters based on the TIBCO Adapter SDK automatically configure themselves to send or receive messages in ASCII/Latin-1 or in UTF-8 wire format, depending on how the associated server-based repository instance is configured.
If you are using TIBCO Designer 5.1.2 or later to prepare the configuration, you can set the encoding directly as an attribute of the adapter configuration.
The conversion from the internal Unicode data (in UTF-16) to the wire format encoding is accomplished by the serialization to an MTree instance.
Two adapters based on the SDK can communicate only if they use the same encoding on the wire. A problem arises if one adapter sends Latin-1 encoded messages to another adapter expecting UTF-8 encoded messages. Since the second adapter is expecting UTF-8 on the wire, Latin-1 characters are interpreted incorrectly.
SDK C++ API takes the encoding value from the repository and no other regional settings can affect the value of encoding. There can be only one encoding per process or application. If multiple MApp application managers are running inside an SDK adapter, and each MApp connects to a different repository, all the repositories must have the same encoding value.
SDK-Internal C++ Unicode Type Conversion
This section gives an overview of how the C++ SDK performs conversion.
Internally, the C++ SDK first decides to use one of two native implementations: Latin-1 for single-byte characters or UTF-16 for double-byte characters. Whether the SDK attempts conversion, and what conversion the SDK attempts depends on the encoding argument presented to the constructor for MChar or MStringData.
For all other cases, the SDK attempts a best-case conversion. If conversion is required (for example, UTF-16 to Latin-1), a replacement character is used for unmappable characters.
If Unicode conversion to and from arbitrary encodings is required, a file (tibicudata32.dat) containing a lookup table is required.
Set the environment variable TIB_ICU_DATA to point to the directory that contains the tibicudata32.dat file. You need to set the variable manually. If SDK cannot find this file, it will throw an exception when you attempt to convert certain types of string encodings.
You can find the tibicudata32.dat file in the TIBCO Runtime Agent config/g11n directory. This directory also contains a tibicudata.dat file for backward compatibility with versions prior to SDK 5.3.
This release of SDK uses the ICU (International Components for Unicode) 3.2 version. Some common aliases from ICU are shown in the ICU Converter Explorer available at http://icu-project.org/icu-bin/convexp.
However, to maintain backward compatibility, only Adapter SDK encoding types listed in the SDK_HOME\include\MEncoding.h should be used, not common alias names listed on the site above.
Specifying the Wire Format Encoding
The wire format encoding for messages affects all communications for adapter applications. Either Latin-1 or UTF-8 is supported as the wire format encoding when the adapter application is using a server-based project repositories.
If the project uses only ASCII or Latin-1 data, you can set the encoding to be Latin-1, which makes the custom adapter run faster. Otherwise, use UTF-8.
Specifying Encoding for Server-Based Repositories
All project repositories managed by a particular administration server use the same encoding. You can specify the wire format encoding as a server property (repo.encoding) in the tibcoadmin.tra file. To change the wire format encoding, shut down the server, edit the tibcoadmin.tra file and then restart the server.
One reason for choosing a particular encoding may be consistency with another TIBCO application that uses a fixed encoding.
Specifying Encoding for File-Based Repository
At some stages of a project, you may use a file-based repository. In that case, the encoding can be set in the project repository file itself. You can make the change using the Repository Finder in TIBCO Designer or editing the .dat file directly.
Add the instance property below:
<instanceInfoProperty name="encoding" value="desired_encoding"/>
Note that if this repository instance is later managed by a repository server, the encoding used by the server overrides the encoding of the file.
The encoding only affects communication, it has no effect on the persistent storage of the data. TIBCO Administrator stores data in UTF-8 format regardless of the wire format encoding being used.
How TIBCO Administrator Determines Encoding
When an adapter application starts, the TIBCO Administrator client library forces both the instance name and discovery subject to conform to ASCII so that communication works with either encoding.
When the client is actually connecting to a server-based project repository for the first time, the encoding used by the server for that instance determines the encoding type for all TIBCO messages. The server encoding is determined by the repo.encoding parameter in the tibcoadmin.tra file.
All communicating applications must use the same wire format encoding. Therefore, all project repositories in use by applications that communicate with each other must use the same encoding. To understand the use of encoding formats, consider the following scenarios.
Figure 22 Scenarios of Encoding Formats
A client application with an embedded TIBCO Administrator client attempts to connect to two administration servers. In Scenario 1, the two servers use different encodings. In Scenario 2, the two servers use the same encodings.
The components interact as follows:
1.
The client application discovers available administration servers and instances by sending a discovery message on the network. In the message, the server name, instance name, and discovery subject are restricted to ASCII characters only.
2.
3.
In Scenario 1, the client application connects first to InstanceA through Server A. Server A uses Latin-1 as the repo.encoding property because the text is Latin-1. The client is now forced to use Latin-1 as the wire format encoding.
When the client attempts to connect to Server B (which is using UTF-8), an exception is signalled because a lossy conversion would result.
4.
In Scenario 2, the client application connects first to Server A (which is using UTF-8). When the client then attempts to connect to Server B, it succeeds.
Once the client’s encoding is established, an exception is thrown when trying to connect to a server that uses a different encoding.

Copyright © TIBCO Software Inc. All Rights Reserved
Copyright © TIBCO Software Inc. All Rights Reserved