getValidUtf8
Valid UTF-8 Encoding
Description
Converts strings to valid UTF-8 encoded strings.
Usage
getValidUtf8(x)
isValidUtf8(x)
Arguments
Details
You can construct strings with the "UTF-8" encoding that are
not valid UTF-8 byte sequences. TIBCO Enterprise Runtime for R converts
these strings internally to valid UTF-8 sequences when it needs to manipulate them.
The getValidUtf8 function does this conversion explicitly, producing
strings with valid UTF-8 byte sequences.
If all of the string's bytes are ASCII characters less than 0x80, then the strings
have the encoding "unknown". If otherwise, then the strings have the encoding
"UTF-8".
isValidUtf8 tests whether a string has "UTF-8", or if it has
"unknown" encoding and a valid UTF-8 byte sequence.
Value
getValidUtf8 | returns a character vector similar to the input where
all of the elements have been converted to strings with "UTF-8" or
"unknown" encoding, and with valid UTF-8 byte sequences. |
isValidUtf8 | returns a logical vector similar to the input where
TRUE values for the elements are strings with "UTF-8" or
"unknown" encoding, and with valid UTF-8 byte sequences. |
See Also
Examples
x <- rawToChar(as.raw(c(0x61,0xC4,0x61)))
Encoding(x) <- 'UTF-8'
isValidUtf8(x) # returns FALSE
charToRaw(x) # prints [1] 61 c4 61
y <- getValidUtf8(x)
isValidUtf8(y) # returns TRUE
charToRaw(y) # prints [1] 61 c3 84 61