Copyright © TIBCO Software Inc. All Rights Reserved
Copyright © TIBCO Software Inc. All Rights Reserved


Chapter 2 Messages : Character Encoding in Messages

Character Encoding in Messages
Character encodings are named sets of numeric values for representing characters. For example, ISO 8859-1, also known as Latin-1, is the character encoding containing the letters and symbols used by most Western European languages. If your applications are sending and receiving messages that use only English language characters (that is, the ASCII character set), you do not need to alter your programs to handle different character encodings. The EMS server and application APIs automatically handle ASCII characters in messages.
Character sets become important when your application is handling messages that use non-ASCII characters (such as the Japanese language). Also, clients encode messages by default as UTF-8. Some character encodings use only one byte to represent each character, but UTF-8 can potentially use two bytes to represent the same character. For example, the Latin-1 is a single-byte character encoding. If all strings in your messages contain only characters that appear in the Latin-1 encoding, you can potentially improve performance by specifying Latin-1 as the encoding for strings in the message.
EMS clients can specify a variety of common character encodings for strings in messages. The character encoding for a message applies to strings that appear in any of the following places within a message:
MapMessage field names and values
The EMS client APIs (Java, .NET and C) include mechanisms for handling strings and specifying the character encoding used for all strings within a message. The following sections describe the implications of string character encoding for EMS clients.
Supported Character Encodings
Each message contains the name of the character encoding used to encode strings within the message. This character encoding name is one of the canonical names for character encodings contained in the Java specification. You can obtain a list of canonical character encoding names from the java.sun.com website.
Java and .NET clients use these canonical character encoding names when setting or retrieving the character encoding names.
Sending Messages
When a client sends a message, the message stores the character encoding name used for strings in that message. Java clients represent strings using Unicode. A message created by a Java client that does not specify an encoding will use UTF-8 as the named encoding within the message. UTF-8 uses up to four bytes to represent each character, so a Java client can improve performance by explicitly using a single-byte character encoding, if possible.
Java clients can globally set the encoding to use with the setEncoding method or the client can set the encoding for each message with the setMessageEncoding method. For more information about these methods, see the TIBCO Enterprise Message Service Java API Reference.
Typically, C clients manipulate strings using the character encoding of the machine on which they are running. The EMS C client library itself does not do any encoding or decoding of characters. When sending a message, an EMS C client application can use tibemsMsg_SetEncoding to put information into the message describing the encoding used. When receiving a message in an EMS C client application, the encoding can be retrieved using tibemsMsg_GetEncoding. Use a third party library to do the actual decoding based on the retrieved encoding information.

Copyright © TIBCO Software Inc. All Rights Reserved
Copyright © TIBCO Software Inc. All Rights Reserved