validUTF8
Check if a Character Vector is Validly Encoded
Description
Verifies that a character vector is composed of validly encoded bytes.
Usage
validUTF8(x)
validEnc(x)
Arguments
Details
  Not all byte sequences are valid UTF-8 byte sequences.  For example,
  it is impossible to have a UTF-8 byte sequence consisting of a single
  byte greater than 0x7F, because UTF-8 reserves these bytes as part of
  multi-byte characters.  In Spotfire Enterprise Runtime for R, it is possible to construct strings
  with the "UTF-8" encoding that are not valid UTF-8 byte sequences.
-  validUTF8 tests whether the elements of a string vector have
  valid UTF-8 byte sequences.
-   validEnc tests whether the elements of a string vector are
  valid according to their declared encoding. Any string with encoding
  "latin1" or "bytes" is valid, because these encodings allow any byte
  sequence.  Strings with encoding "unknown" or "UTF-8" are valid only 
  if they contain a valid UTF-8 byte sequence.
Value
  
| validUTF8 | returns a logical vector similar to the input
    with TRUE values for the strings whose bytes are valid UTF-8
    byte sequences. | 
  | validEnc | returns a logical vector similar to the input
    with TRUE values for the strings whose bytes are valid
    according to their declared encoding. | 
 
See Also
Examples
x <- c("aa", "aa\30A4", "\xFF")
Encoding(x) <- "UTF-8"
validUTF8(x) ## [1]  TRUE  TRUE FALSE
validEnc(x)  ## [1]  TRUE  TRUE FALSE
Encoding(x) <- "bytes"
validUTF8(x) ## [1]  TRUE  TRUE FALSE
validEnc(x)  ## [1] TRUE TRUE TRUE