read.table
Create a Data Frame by Reading a Table

Description

Reads data from a file or a connection in table format and creates a data frame with the same number of rows as there are lines in the file, and the same number of variables as there are fields in the file.

Usage

read.table(file, header = FALSE, sep = "", quote = "\"'", dec = ".",
    row.names, col.names, as.is = !stringsAsFactors, na.strings = "NA",
    colClasses = NA, nrows = -1, skip = 0, check.names = TRUE,
    fill = !blank.lines.skip, strip.white = FALSE,
    blank.lines.skip = TRUE, comment.char = "#",
    allowEscapes = FALSE, flush = FALSE,
    stringsAsFactors = default.stringsAsFactors(),
    fileEncoding = "", encoding = "unknown", text, skipNul = FALSE)
read.delim(file, header = TRUE, sep = "\t", quote = "\"", dec = ".",
    fill = TRUE, comment.char = "", ...)
read.delim2(file, header = TRUE, sep = "\t", quote = "\"", dec = ",",
    fill = TRUE, comment.char = "", ...)
read.csv(file, header = TRUE, sep = ",", quote = "\"", dec = ".",
    fill = TRUE, comment.char = "", ...)
read.csv2(file, header = TRUE, sep = ";", quote = "\"", dec = ",",
    fill = TRUE, comment.char = "", ...)	

Arguments

file a character string specifying the file or connection from which to read the data. The file should contain one line per row of the table. The fields are separated by the sep character. file can also be a complete URL.
header a logical flag. If TRUE, the first line of the file is used as the variable names of the resulting data frame.
  • For read.table, the default is FALSE, unless there is one fewer field in the first line of the file than in the second line.

  • for read.delim, read.delim2, read.csv, and read.csv2 the default is TRUE.
sep the field separator (a single character). Use "\t" to specify tab separator. If omitted, any amount of white space (blanks or tabs) can separate fields.
quote a character vector specifying the set of quoting characters. To disable quoting, use quote="".
dec the character used for decimal points.
row.names optional specification of the row names for the data frame. If provided, it can give the actual row names as a character vector of length equal to the number of rows. Alternatively, it can specify the column name or index of the column to use as row names.

If row.names is missing, and if there is a header line, and if the first row contains one fewer field than the number of columns, the first column in the input is used for the row names.

Wherever they come from, row names must be unique.

col.names optional names for the variables. If missing, the header information, if any, is used. If all else fails, "V" and the field number are pasted together. Unless check.names=FALSE, the names are converted to syntactic names before assignment.
as.is a vector that controls conversions to factor objects. The default behavior is to convert character variables to factors. The value of as.is is either a vector of logicals or a vector of numeric or character indices to specify which columns should not be converted to factors. This argument controls the conversion of columns except when the colClasses argument is specified. The argument is replicated as needed to be of length equal to the number of fields. Thus, as.is=FALSE converts all character fields.
na.strings a character vector. These characters are interpreted as NA values. Blank fields are also considered to be missing values in logical, integer, numeric, and complex fields.
colClasses a character vector. Specifies the classes for the columns. Recycled as necessary, or unspecified values are taken to be NA. Possible values are NA (determine the type), "NULL" (the column is skipped), atomic vector classes, or "factor", "Date", or "POSIXct".

colClasses should be specified for every column, including the column for row names (if it exists).

nrows an integer that specifies the maximum number of lines to be read. Negative and invalid values are ignored.
skip an integer that specifies the number of lines in the file to skip before reading data.
check.names a logical value. If TRUE, the names of the variables in the data frame are checked to ensure that they are syntactically valid variable names. If needed, they are adjusted by make.names to ensure they are valid and without duplicates.
fill a logical value. If TRUE, then if the rows have unequal length, blank fields are added implicitly.
strip.white a logical value. Used only when sep is specified. If TRUE, read.table strips the leading and trailing white space from character fields. (Numeric fields are always stripped).
blank.lines.skip a logical value. If TRUE, blank lines are ignored.
comment.char a character value that specifies the character to use for comments. Any data after comment.char on a line are ignored. If "", turns off the interpretation of comments altogether.
allowEscapes a logical value. If TRUE, C-style escapes such as "\n" are processed. If FALSE (the default), they are read verbatim.
flush a logical value. If TRUE, read.table flushes to the end of the line after reading the last of the fields requested. The default is FALSE.
stringsAsFactors a logical value. An alternative (now preferred) name for the inverse of the as.is argument. The default is FALSE, unless options(stringsAsFactors=FALSE). This value is overridden by as.is and colClasses.

Note that as of version 5.2 of TIBCO Enterprise Runtime for R, default.stringsAsFactors() returns FALSE instead of the previous TRUE unless the user has execututed options(stringsAsFactors=TRUE) earlier in the session. This matches the same change in version 4.0.0 of open-source R.

fileEncoding a character string specifying the character encoding for the input stream. Used for converting input bytes to characters. This can be any of the encodings accepted by iconv, including "native.enc". The default value, the empty string, specifies using options("encoding"). This argument is not used when the file argument is a connection, because a connection already has an associated encoding (see file).
encoding the string encoding to use for newly-created strings. (See Encoding.) In TIBCO Enterprise Runtime for R, this argument is not implemented.
text a character vector. If specified, and file is not specified, read.table reads from the character vector.
skipNul A logical value telling how to deal with nul (0) bytes in the input file. If FALSE then read.table will warn about nul bytes and will probably give inappropriate results (because they terminate strings). If TRUE then the nul bytes will be omitted as they are read. If the input character encoding is one that normally includes nul bytes (such as UTF-16), this value is ignored, and these bytes will not be skipped.

Details

read.csv, read.csv2, read.delim and read.delim2 are a packaging of the function read.table with different default arguments. The default sep is "," for read.csv and read.csv2 and the default sep is "\t" for read.delim and read.delim2. The dec character is set to , in read.csv2 and read.delim2. The defaults for header and fill are set to TRUE for these four functions.
If colClasses is not set, all columns are considered as character columns and then converted to logical, integer, numeric, complex, or factor (depending on as.is). Quotes are interpreted in all fields.
If row.names is not set and the first line has one fewer entry than the number of columns, then the first column is taken to be the row names. If row.names is set and does not refer to the first column, that column is discarded.
The number of data columns is determined by examining the first five lines of input or the length of col.names.
Value
a data frame with as many rows as the file has lines (or one less if header=TRUE) and as many variables as the file has fields (or one less if one variable was used for row names).
Unless colClasses is specified, fields are initially read in as character data. If all the items in a field are numeric, the corresponding variable is numeric. Otherwise, it is character or factor, as controlled by the as.is argument.
All lines must have the same number of fields (except the header, which can have one fewer if the first field is used for row names).
If input is empty, it causes an error unless col.names is set, in which case a 0-row data frame is created.
References
Chambers, J. M. and Hastie, T .J. (Eds.) 1992. Data for models. Statistical Models in S. Chapter 3. Pacific Grove, CA.: Wadsworth & Brooks/Cole.
See Also
scan, count.fields, Sys.setlocale, iconv, Encoding, write.table
Examples
# A comma delimited file:
carfile1 <- tempfile()
write(file=carfile1, ncol=1, c(
"Carline,EngDispl,Cylinders,CityMPG,HwyMPG,CombinedMPG",
"SOUL,1.6,4,25,30,27",
"Q7,3,6,16,22,18",
"SIENNA,2.7,4,19,24,21",
"CAMRY,2.5,4,25,35,28",
"G37x,3.7,6,18,25,20",
"FRONTIER2WD,2.5,4,19,23,21"
))
## Not run: 
file.show(carfile1)  # view file
## End(Not run)

CarMPG1 <- read.table(carfile1, header=TRUE, sep=",") CarMPG1 file.remove(carfile1); rm(carfile1, CarMPG1) # Clean up

# A space delimited file with spaces insides strings: carfile2 <- tempfile() write(file=carfile2, ncol=1, c( 'Carline EngDispl Cylinders CityMPG HwyMPG CombinedMPG', '"Kia SOUL" 1.6 4 25 30 27', '"Audi Q7" 3 6 16 22 18', '"Toyota SIENNA" 2.7 4 19 24 21', '"Toyota CAMRY" 2.5 4 25 35 28', '"Infiniti G37x" 3.7 6 18 25 20', '"Nissan FRONTIER2WD" 2.5 4 19 23 21' )) ## Not run: file.show(carfile2) # view file, note strings are in quotes ## End(Not run)

CarMPG2 <- read.table(carfile2, header=TRUE) CarMPG2 file.remove(carfile2); rm(carfile2, CarMPG2) # Clean up

# A space delimited file where the first row has one less field: carfile3 <- tempfile() write(file=carfile3, ncol=1, c( 'EngDispl Cylinders CityMPG HwyMPG CombinedMPG', '"Kia_SOUL" 1.6 4 25 30 27', '"Audi_Q7" 3 6 16 22 18', '"Toyota_SIENNA" 2.7 4 19 24 21', '"Toyota_CAMRY" 2.5 4 25 35 28', '"Infiniti_G37x" 3.7 6 18 25 20', '"Nissan_FRONTIER2WD" 2.5 4 19 23 21' )) ## Not run: file.show(carfile3) # view file ## End(Not run)

CarMPG3 <- read.table(carfile3) # header=TRUE assumed CarMPG3 # First column in file used as the row names file.remove(carfile3); rm(carfile3, CarMPG3) # Clean up

Package utils version 6.0.0-69
Package Index