read.fwf(file, widths, header = FALSE, sep = "\t", skip = 0, row.names, col.names, n = -1, buffersize = 2000, ...)
file | a character string specifying the file name, or a connection object, to be read. |
widths | an integer vector or list containing the the fixed widths for one line or multi-line records. Currently, multi-line records are not supported. |
header | logical flag to indicate if the first line of the file is read and used as header. If TRUE, the header names must be separated by sep. |
sep | the field separator (single character) used to separate the header fields. Only used when header=TRUE. The default is "\t" for tab. |
skip | the number of lines to be skipped at the beginning of file. The default 0 means no line is skipped. |
row.names |
optional specification of the row names for the data frame.
If provided, it can give the actual row names, as a character vector of length equal to
the number of rows. Alternatively, it can specify the column name or index of the column
to use as row names.
Row names, wherever they come from, must be unique. |
col.names | optional names for the variables. If missing, the header information, if any, is used; if all else fails, "V" and the field number are be pasted together. Unless check.names=FALSE, the names will be converted to syntactic names before assignment. |
n | the total number of lines could be read. -1 means no limitation. |
buffersize | This argument is not yet implemented. the maximal number of lines could be read at one time. |
... |
other optional arguments:
dec
the character used for decimal points.
as.is
control over conversions to factor objects.
The default behavior is to convert character variables to factors.
This argument is controls the conversion of columns except the colClasses is specified.
Its value is either a vector a logicals or a vector of numeric or character indices to specify
which columns should not be converted to factors.
The argument will be replicated as needed to be of length equal to the number of fields; thus, as.is=FALSE converts all character fields. na.strings
character vector; these characters will be interpreted as NA values.
Blank fields are also considered to be missing values in logical, integer, numeric and complex fields.
colClasses
a character vector. Specifies the classes for the columns. Recycled as necessary, or unspecified values are taken to be NA.
Possible values are NA(means guess the type), "NULL"(means the column is skipped), atomic vector classes, or "factor", "Date"
or "POSIXct".
check.names
logical value. if TRUE, then the names of the variables in the data frame are checked to ensure that they are syntactically
valid variable names. If needed, they are adjusted by make.names to ensure they are valid and without duplicates.
strip.white
logical value. If TRUE, read.fwf will strip the leading and trailing white space
from character fields (numeric fields are always stripped).
stringsAsFactors
logical value. An alternate (now preferred) name for the inverse of the as.is argument.
The default is FALSE, unless one sets options(stringsAsFactors=TRUE).
This value is overridden by as.is and colClasses.
Note that as of version 5.2 of TIBCO Enterprise Runtime for R, default.stringsAsFactors() returns FALSE instead of the previous TRUE unless the user has execututed options(stringsAsFactors=TRUE) earlier in the session. This matches the same change in version 4.0.0 of open-source R. fileEncoding
character string.
the character encoding for the input stream,
used for converting input bytes to characters.
This can be any of the encodings accepted by iconv,
including "native.enc".
The default value, the empty string, means to use options("encoding").
This argument is not used when the file argument is a connection,
since a connection already has an associated encoding (see file).
|
f <- tempfile() cat("ID01Joe M6.21970-01-01\nID02Janet F5.61977-04-20\nID03Adam M5.71983-03-13\n", file=f)# Minimal specification, guess types d <- read.fwf(f, c(4,6,1,3,10)) str(d) #data.frame w/ 3 obs. of 5 variables: # $V1: factor[1:3] w/ 3 levels ID01 ID02 ID03: 1 2 3 # $V2: factor[1:3] w/ 3 levels Adam Janet ..: 3 2 1 # $V3: factor[1:3] w/ 2 levels F M: 2 1 2 # $V4: num[1:3] 6.2 5.6 5.7 # $V5: factor[1:3] w/ 3 levels 1970-01-01..: 1 2 3
# Full specification: names and types d <- read.fwf(f, c(4,6,1,3,10), col.names=c('ID','Name','Gender','Height','DateOfBirth'), colClasses=c('character','character','factor','numeric','Date'), strip.white=TRUE) str(d) #data.frame w/ 3 obs. of 5 variables: # $ID: chr[1:3] "ID01" "ID02" "ID03" # $Name: chr[1:3] "Joe" "Janet" "Adam" # $Gender: factor[1:3] w/ 2 levels F M: 2 1 2 # $Height: num[1:3] 6.2 5.6 5.7 # $DateOfBirth: Date[1:3] "1970-01-01" "1977-04-20" "1983-03-13"
# Skip gender column d <- read.fwf(f, c(4,6,-1,3,10), col.names=c('ID','Name','Height','DateOfBirth'), colClasses=c('character','character','numeric','Date'), strip.white=TRUE) str(d) #data.frame w/ 3 obs. of 4 variables: # $ID: chr[1:3] "ID01" "ID02" "ID03" # $Name: chr[1:3] "Joe" "Janet" "Adam" # $Height: num[1:3] 6.2 5.6 5.7 # $DateOfBirth: Date[1:3] "1970-01-01" "1977-04-20" "1983-03-13"