validUTF8
Check if a Character Vector is Validly Encoded

Description

Verifies that a character vector is composed of validly encoded bytes.

Usage

validUTF8(x)
validEnc(x)

Arguments

xa character vector.

Details

Not all byte sequences are valid UTF-8 byte sequences. For example, it is impossible to have a UTF-8 byte sequence consisting of a single byte greater than 0x7F, because UTF-8 reserves these bytes as part of multi-byte characters. In TIBCO Enterprise Runtime for R, it is possible to construct strings with the "UTF-8" encoding that are not valid UTF-8 byte sequences.
Value
validUTF8returns a logical vector similar to the input with TRUE values for the strings whose bytes are valid UTF-8 byte sequences.
validEncreturns a logical vector similar to the input with TRUE values for the strings whose bytes are valid according to their declared encoding.
See Also
Encoding.
Examples
x <- c("aa", "aa\30A4", "\xFF")
Encoding(x) <- "UTF-8"
validUTF8(x) ## [1]  TRUE  TRUE FALSE
validEnc(x)  ## [1]  TRUE  TRUE FALSE

Encoding(x) <- "bytes" validUTF8(x) ## [1] TRUE TRUE FALSE validEnc(x) ## [1] TRUE TRUE TRUE

Package base version 6.0.0-69
Package Index