normalizeUnicode
Normalize Unicode Characters
Description
Convert a vector of strings using one of several
defined types of Unicode normalization.
Usage
normalizeUnicode(x, form = "NCF")
Arguments
  
| x | a vector of strings. | 
  | form | a character string specifying the type of Unicode normalization
to be used.  Should be one of the strings
"NFC", "NFD",
"NFKC", "NFKD",
"NFKC_CF" or "NFKC_Casefold". | 
 
Details
Unicode allows multiple character sequences to represent the same
string.  For example, the string "capital A with two dots" can be
represented as a single character 
"\u00C4", or as the two
characters 
"A\u0308".  The Unicode standard defines multiple ways to
"normalize" a Unicode string so different ways of representing a given
string map to the same "canonical form" (see
http://unicode.org/reports/tr15/).  Normalizing Unicode strings is
necessary in order to consistently compare or sort strings in
languages with accented characters.
Each string is converted to UTF-8 before conversion, and the resulting
strings all have the UTF-8 encoding.
Value
 A vector of strings, with each element of x converted according
to the specified normalization form.  Attributes from x are
copied to the output value.
See Also
Examples
all.equal(normalizeUnicode('\u212B','NFC'), '\u00C5')