scan
Input Data from a File or Connection

Description

Reads data from a connection or interactively from standard input. Options are available to control how the file is read and the structure of the data.

Usage

scan(file = "", what = double(), nmax = -1, n = -1, sep = "",
     quote = if (identical(sep, "\n")) "" else "'\"", dec = ".",
     skip = 0, nlines = 0, na.strings = "NA", flush = FALSE,
     fill = FALSE, strip.white = FALSE, quiet = FALSE,
     blank.lines.skip = TRUE, multi.line = TRUE, comment.char = "",
     allowEscapes = FALSE, fileEncoding = "", encoding = "unknown",
     text, skipNul = FALSE)

Arguments

file the file or connection object to be scanned. If file is missing or empty (""), data is read from standard input. In this case, scan prompts with the index for the next data item, and input can be terminated by a blank line. For more details on reading data from connections, see file.
what a vector of mode "logical", "integer", "numeric", "character", "raw" or "complex", or a list of vectors of these modes. The scan function reads all fields in the file as data of the same mode as what. Thus, what=character() or what="" reads data as character fields. If what is missing, scan interprets all fields as numeric.

If what is a list, then each record is considered to have length(what) fields, and the mode of each field is the mode of the corresponding component in what. Any NULL values in the list implies skipping that field.

nmax an integer. Specifies the maximum number of data values to be read. When what is a list, it is the maximum number of records to be read. If note provided or not positive or invalid, then scan reads to the end of file.
n the maximum number of items to read from the file (the number of records times the fields per record). If omitted, the function reads to the end of file, or to an empty line if reading from standard input.
sep a single-character separator, often "\t" for tabs or "\n" for newlines. If omitted, any amount of white space (blanks, tabs, and possibly newlines) can separate fields. By default, sep="".
quote A character string listing the quote characters used in the file. Characters between a pair of identical quote characters are considered to be a single character string -- the quote characters are not part of that string. As a special case, NULL, may be used instead of "" to indicate that there are no quote characters. The default is "\"'", indicating that single and double quotation marks are the quote characters. Use "\"" to allow single quotes in strings (as in names).
dec a character as decimal point character. Only a character string with just one single-byte character(or NULL and zero-length character are also accepted, but considered as default).
skip the number of initial lines of the file that should be skipped prior to reading. By default, skip=0 and reading begins at the top of the file.
nlines an integer to specify the maximum number of lines of data to be read. It is omitted if not positive or zero.
na.strings a character string vector. Elements in this vector are to be considered as NA.
flush a logical value. If flush=TRUE, then the scan function flushes to the end of the line after reading the last of the fields requested. This allows you to include comments that are not read by scan after the last field. It also prevents multiple sets of items from being placed on one line. By default, flush=FALSE.
fill a logical value. If TRUE, when any line has fewer fields specified by what, then scan adds empty fields to them.
strip.white a vector of logical values corresponding to items in the what argument. The strip.white argument allows you to strip leading and trailing white space from character fields; scan always strips numeric fields in this way. If strip.white is not NULL, it must be either of length 1, in which case the single logical value tells whether to strip all fields read, or it must be the same length as what, in which case the logical vector tells which fields to strip. For example, if strip.white[1]=TRUE and field 1 is character, scan strips the leading and trailing white space from field 1. If you read free-format input by leaving sep unspecified, then strip.white has no effect.
quiet a logical value. If FALSE, scan prints a line of how many items have been read.
blank.lines.skip a logical value. If TRUE blank lines will be omitted, except when counting skip and nlines.
multi.line a logical value. Used only when what is a list. If multi.line=FALSE, then all fields must appear on one line. If scan reaches the end of a line without reading all the fields, then an error occurs. Thus, the number of fields on each line must be a multiple of the length of what, unless flush=TRUE. This is useful for checking that no fields have been omitted. If multi.line=TRUE, then reading continues and the positions of newlines are disregarded. By default, multi.line=FALSE.
comment.char a character vector of length one. Contains one single character or an empty string. The default value "" turns off the interpretation of comments altogether.
allowEscapes a logical value. To determine how to deal with C-style escapes such as \n. The escapes that are interpreted are the control characters \a, \b, \f, \n, \r, \t, \v and octal and hexadecimal representations like \040 and \x2A.
fileEncoding a character string. The character encoding for the input stream, used for converting input bytes to characters. This can be any of the encodings accepted by iconv, including "native.enc". The default value, the empty string, means to use options("encoding"). This argument is not used when the file argument is a connection, because a connection already has an associated encoding (see file).
encoding the string encoding to use for newly-created strings. (See Encoding.) In TERR, this argument is not implemented.
text a character vector. If specified, and if file is not specified, then scan reads from the character vector.
skipNul a logical value. Specifies how to deal with nul (0) bytes in the input file.
  • If FALSE, then scan warns about nul bytes and probably gives incorrect results (because nul bytes terminate character strings).
  • If TRUE, then the nul bytes are omitted as they are read.
If the input character encoding is one that normally includes nul bytes (such as UTF-16), then this value is ignored, and these bytes are not skipped.

Details

It is possible to read files that contain more than one mode by specifying a list as the what argument. For example, if the fields in the file myfile are alternately numeric and character, the command scan(myfile, what=list(0,"")) reads them and returns an object of mode "list" that has a numeric vector and a character vector as its two elements.
The elements of what can be anything, as long as you have numbers where you want numeric fields, character data where you want character fields, and complex numbers where you want complex fields. A NULL component in what causes the corresponding field to be skipped during input. Note that scan retains the names attribute of the list, if any. Thus, the command z <- scan(myfile, what=list(pop=0, city="")) allows you to refer to z\$pop and z\$city.
Any numeric field containing the characters NA is returned as a missing value. If the field separator (the sep argument) is given and the field is empty, the returned value is NA for a numeric or complex field and "" for a character field.
The main use of separators is to allow white space inside character fields. For example, suppose in the command above that the numeric field is to be followed by a tab, with text filling out the rest of the line. The command z <- scan(myfile, what=list(pop=0, city=""), sep="\t") allows blanks in the city name. With no separator, arbitrary white space can be included by quoting the whole string. With a separator, quotes are not used; if the separator character is to be included in a string, it must be escaped by a preceding backslash.
As it reads more and more records, scan allocates more space to accommodate the growing vectors. If you supply an nmax or an n argument specifying how many rows or items you expect, then the vectors can be pre-allocated, which can result in much better performance.
If comment.char occurs (except inside a quoted character field), it signals that the rest of the line should be regarded as a comment and be discarded. Lines beginning with a comment character (possibly after white space with the default separator) are treated as blank lines.
Value
a list or vector like the what argument if it is present, and a numeric vector if what is omitted.
See Also
read.table parse, write, readline, file, Sys.setlocale, iconv, Encoding.
Examples
# Read numeric values from standard input.
## Not run: 
num <- scan()

## End(Not run)

## Not run: # Read text values from standard input. txt <- scan(what="") ## End(Not run)

# read from specified text argument scan(text=c("1 2\n3 4", "5 6"))

# Read a label and two numeric fields to make a matrix. tfile <- tempfile() cat("row1 9 10", "row2 2 3", sep="\n", file=tfile) z <- scan(tfile, list(name="", 0, 0)) mat <- cbind(z[[2]], z[[3]]) dimnames(mat) <- list(z$name, c("X","Y")) mat

# Read a CSV file, skip the first line of the file # and save in single precision: cat(file=tfile, sep="\n", "This line is skipped", "1,1,2,3,5,8", "13,21,34,55", "89,144,233,377") scan(tfile, single(0), skip=1)

# Clean up: unlink(tfile)

Package base version 6.1.1-7
Package Index