Latin Normalization in Character Sets

In Unicode, you can express Latin characters, including letters, commas, and so forth, in two value ranges.

  • The commonly used Unicode range 0021-007E, called canonical in this guide
  • Full-width Latin, which is Unicode range FF01-FF5E

Full-width Latin characters have wider glyphs, which make the characters look more natural when they are used together with characters in certain Asian languages.

Also, the Unicode space character exists in two forms:

  • Space character 0020, called canonical in this guide
  • IDEOGRAPHIC SPACE (value 3000), which is a full-width version

Some character sets are available in two forms, for example, CCSID 300 and CCSID 300 with latin normalization. Those forms apply to character sets that, instead of containing the conventional Latin characters, have their full-width equivalents.

During rendering, the character set named XXX with latin normalization applies additional processing to the data before obtaining bytes according to the specification for the character set XXX. That is, a conversion from the canonical Latin range to full-width range occurs.

Similarly, during parsing, the character set XXX with latin normalization converts full-width Latin characters to canonical Latin characters after obtaining the characters from the byte content according to the specification for the character set XXX.

Note: The Plug-in also converts the space characters between canonical and full-width versions in the same manner as it does for Latin characters.

Note: If the character set you intend to use is available in two forms, your choice likely depends on the intended recipient of the data. Here are the considerations:
  • If you are parsing data that might contain Latin characters, the character set with Latin normalization lends better interoperability with systems and components that are more suitable for the canonical Latin data. For instance, some Windows fonts might not have glyphs for full-width Latin characters and might hence cause the display of full-width Latin characters as blocks.
  • If you are preparing data for a legacy program or component that specifically requires full-width Latin, the character set that does not perform Latin normalization would be a more appropriate choice.