Valid UTF-8 Encoding

getValidUtf8

Description

Convert strings to valid UTF-8 encoded strings.

Usage

getValidUtf8(x)
isValidUtf8(x)

Arguments

x	a character vector.

Details

It is possible to construct strings with the "UTF-8" encoding that are not valid UTF-8 byte sequences. TIBCO Enterprise Runtime for R converts these internally to valid UTF-8 sequences when it needs to manipulate them. The getValidUtf8 function does this conversion explicitly, and isValidUtf8 tests whether a string is already a valid UTF-8 string.

Value

getValidUtf8 returns a character vector similar to the input, where all of the elements have been converted to UTF-8 encoded strings with valid UTF-8 byte sequences.

isValidUtf8 returns a logical vector similar to the input, with TRUE values for the elements that were UTF-8 encoded strings with valid UTF-8 byte sequences.

See Also

Encoding.

Examples

x <- rawToChar(as.raw(c(0x61,0xC4,0x61)))
Encoding(x) <- 'UTF-8'
isValidUtf8(x) # returns FALSE
charToRaw(x) # prints [1] 61 c4 61
y <- getValidUtf8(x)
isValidUtf8(y) # returns TRUE
charToRaw(y) # prints [1] 61 c3 84 61

Package terrUtils version 4.0.0-28
Package Index