Strings and Character Encodings
Rendezvous software uses strings in several roles:
• | String data inside message fields |
• | Field names |
• | Subject names (and other associated strings that are not strictly inside the message) |
• | Certified delivery (CM) correspondent names |
• | Group names (fault tolerance) |
Java programs represent all these strings in the Unicode 2-byte character set.
• | Before sending an outbound message, Rendezvous software translates these strings into the character encoding appropriate to the ISO locale. |
• | Conversely, when extracting a string from an inbound message, Rendezvous software translates it to Unicode. |
Rendezvous translates its strings as if the message used the default encoding (see Default Encoding, below). This assumption is not always correct (see Inbound Translation, below).
The default encoding depends on the locale where Java is running. That is, the locale determines the value of the Java system property file.encoding
, which in turn determines the translation scheme.
For example, the United States is locale en_US
, and uses the Latin-1 character encoding (also called ISO 8859-1); Japan is locale ja_JP
, and uses the Shift-JIS character encoding.
When the system property file.encoding
is inaccessible, the default encoding is 8859-1
(Latin-1). Programs can override this system property; for details, see TibrvMsg.setStringEncoding().
Outbound Translation
Outbound translation from Unicode to a specified encoding occurs when adding a string to a message.
A wire-format string can contain only characters that are valid in the encoding of the surrounding message. The translation procedure detects exotic characters, and throws an exception with TibrvStatus.INVALID_ENCODING.
Inbound Translation
Inbound translation occurs before the program receives the data.
Automatic inbound translation is correct when two programs exchange messages within the same locale.
Warning |
In contrast, the automatic translation might be incorrect when the sender and receiver use different character encodings. In this situation, the receiver must explicitly retranslate to the local encoding. |