getValidUtf8
Valid UTF-8 Encoding
Description
Convert strings to valid UTF-8 encoded strings.
Usage
getValidUtf8(x)
isValidUtf8(x)
Arguments
Details
It is possible to construct strings with the "UTF-8" encoding that are
not valid UTF-8 byte sequences. TIBCO Enterprise Runtime for R converts
these internally to valid UTF-8 sequences when it needs to manipulate them. The
getValidUtf8 function does this conversion explicitly, and
isValidUtf8 tests whether a string is already a valid UTF-8
string.
Value
getValidUtf8 returns a character vector similar to the input,
where all of the elements have been converted to UTF-8 encoded strings
with valid UTF-8 byte sequences.
isValidUtf8 returns a logical vector similar to the input, with
TRUE values for the elements that were UTF-8 encoded strings with valid
UTF-8 byte sequences.
See Also
Examples
x <- rawToChar(as.raw(c(0x61,0xC4,0x61)))
Encoding(x) <- 'UTF-8'
isValidUtf8(x) # returns FALSE
charToRaw(x) # prints [1] 61 c4 61
y <- getValidUtf8(x)
isValidUtf8(y) # returns TRUE
charToRaw(y) # prints [1] 61 c3 84 61