Valid UTF-8 Encoding

getValidUtf8

Description

Converts strings to valid UTF-8 encoded strings.

Usage

getValidUtf8(x)
isValidUtf8(x)

Arguments

x	a character vector.

Details

You can construct strings with the "UTF-8" encoding that are not valid UTF-8 byte sequences. TIBCO Enterprise Runtime for R converts these strings internally to valid UTF-8 sequences when it needs to manipulate them. The getValidUtf8 function does this conversion explicitly, producing strings with valid UTF-8 byte sequences.

If all of the string's bytes are ASCII characters less than 0x80, then the strings have the encoding "unknown". If otherwise, then the strings have the encoding "UTF-8".

isValidUtf8 tests whether a string has "UTF-8", or if it has "unknown" encoding and a valid UTF-8 byte sequence.

Value

getValidUtf8	returns a character vector similar to the input where all of the elements have been converted to strings with "UTF-8" or "unknown" encoding, and with valid UTF-8 byte sequences.
isValidUtf8	returns a logical vector similar to the input where TRUE values for the elements are strings with "UTF-8" or "unknown" encoding, and with valid UTF-8 byte sequences.

See Also

Encoding.

Examples

x <- rawToChar(as.raw(c(0x61,0xC4,0x61)))
Encoding(x) <- 'UTF-8'
isValidUtf8(x) # returns FALSE
charToRaw(x) # prints [1] 61 c4 61
y <- getValidUtf8(x)
isValidUtf8(y) # returns TRUE
charToRaw(y) # prints [1] 61 c3 84 61

Package terrUtils version 6.0.0-69
Package Index