Read Fixed Width Format Files

read.fwf

Read Fixed Width Format Files

Description

Read data frame from a file in fixed width format.

Usage

read.fwf(file, widths, header = FALSE, sep = "\t", skip = 0, 
    row.names, col.names, n = -1, buffersize = 2000, ...)

Arguments

file	a character string specifying the file name, or a connection object, to be read.
widths	an integer vector or list containing the the fixed widths for one line or multi-line records. Currently, multi-line records are not supported.
header	logical flag to indicate if the first line of the file is read and used as header. If TRUE, the header names must be separated by sep.
sep	the field separator (single character) used to separate the header fields. Only used when header=TRUE. The default is "\t" for tab.
skip	the number of lines to be skipped at the beginning of file. The default 0 means no line is skipped.
row.names	optional specification of the row names for the data frame. If provided, it can give the actual row names, as a character vector of length equal to the number of rows. Alternatively, it can specify the column name or index of the column to use as row names. Row names, wherever they come from, must be unique.
col.names	optional names for the variables. If missing, the header information, if any, is used; if all else fails, "V" and the field number are be pasted together. Unless check.names=FALSE, the names will be converted to syntactic names before assignment.
n	the total number of lines could be read. -1 means no limitation.
buffersize	This argument is not yet implemented. the maximal number of lines could be read at one time.
...	other optional arguments: dec the character used for decimal points. as.is control over conversions to factor objects. The default behavior is to convert character variables to factors. This argument is controls the conversion of columns except the colClasses is specified. Its value is either a vector a logicals or a vector of numeric or character indices to specify which columns should not be converted to factors. The argument will be replicated as needed to be of length equal to the number of fields; thus, as.is=FALSE converts all character fields. na.strings character vector; these characters will be interpreted as NA values. Blank fields are also considered to be missing values in logical, integer, numeric and complex fields. colClasses a character vector. Specifies the classes for the columns. Recycled as necessary, or unspecified values are taken to be NA. Possible values are NA(means guess the type), "NULL"(means the column is skipped), atomic vector classes, or "factor", "Date" or "POSIXct". check.names logical value. if TRUE, then the names of the variables in the data frame are checked to ensure that they are syntactically valid variable names. If needed, they are adjusted by make.names to ensure they are valid and without duplicates. strip.white logical value. If TRUE, read.fwf will strip the leading and trailing white space from character fields (numeric fields are always stripped). stringsAsFactors logical value. An alternate (now preferred) name for the inverse of the as.is argument. The default is FALSE, unless one sets options(stringsAsFactors=TRUE). This value is overridden by as.is and colClasses. Note that as of version 5.2 of TIBCO Enterprise Runtime for R, default.stringsAsFactors() returns FALSE instead of the previous TRUE unless the user has execututed options(stringsAsFactors=TRUE) earlier in the session. This matches the same change in version 4.0.0 of open-source R. fileEncoding character string. the character encoding for the input stream, used for converting input bytes to characters. This can be any of the encodings accepted by iconv, including "native.enc". The default value, the empty string, means to use options("encoding"). This argument is not used when the file argument is a connection, since a connection already has an associated encoding (see file).

Details

The fixed widths are defined in argument widths.

Zero width fields are replaced with NAs. A negative width means that the field is skipped. Note that the length of col.names and colClasses should be equal to the number of non-negative widths.

In R (as of version 2.14.0), read.fwf seems to convert the fwf file to a format suitable for read.table and then calls read.table to read it. That is why the ... arguments exist. TIBCO Enterprise Runtime for R reads the fwf file directly.

Value

a data frame. Unless colClasses is specified, fields are initially read in as character data. If all the items in a field are numeric, the corresponding variable is numeric. Otherwise, it is character or factor, as controlled by the as.is argument.

See Also

read.table, file, readLines

Examples

    f <- tempfile()
    cat("ID01Joe   M6.21970-01-01\nID02Janet F5.61977-04-20\nID03Adam  M5.71983-03-13\n", file=f)
    # Minimal specification, guess types
    d <- read.fwf(f, c(4,6,1,3,10))
    str(d)
    #data.frame w/ 3 obs. of 5 variables:
    # $V1: factor[1:3] w/ 3 levels ID01 ID02 ID03:  1 2 3
    # $V2: factor[1:3] w/ 3 levels Adam   Janet ..:  3 2 1
    # $V3: factor[1:3] w/ 2 levels F M:  2 1 2
    # $V4: num[1:3] 6.2 5.6 5.7
    # $V5: factor[1:3] w/ 3 levels 1970-01-01..:  1 2 3
    # Full specification: names and types
    d <- read.fwf(f, c(4,6,1,3,10), col.names=c('ID','Name','Gender','Height','DateOfBirth'), colClasses=c('character','character','factor','numeric','Date'), strip.white=TRUE)
    str(d)
    #data.frame w/ 3 obs. of 5 variables:
    # $ID: chr[1:3] "ID01" "ID02" "ID03"
    # $Name: chr[1:3] "Joe" "Janet" "Adam"
    # $Gender: factor[1:3] w/ 2 levels F M:  2 1 2
    # $Height: num[1:3] 6.2 5.6 5.7
    # $DateOfBirth: Date[1:3] "1970-01-01" "1977-04-20" "1983-03-13"
    # Skip gender column
    d <- read.fwf(f, c(4,6,-1,3,10), col.names=c('ID','Name','Height','DateOfBirth'), colClasses=c('character','character','numeric','Date'), strip.white=TRUE)
    str(d)
    #data.frame w/ 3 obs. of 4 variables:
    # $ID: chr[1:3] "ID01" "ID02" "ID03"
    # $Name: chr[1:3] "Joe" "Janet" "Adam"
    # $Height: num[1:3] 6.2 5.6 5.7
    # $DateOfBirth: Date[1:3] "1970-01-01" "1977-04-20" "1983-03-13"

Package utils version 6.0.0-69
Package Index