Reshape Grouped Data

reshape

Reshape Grouped Data

Description

Reshape data with repeated measurements on individuals at various times between "long" format and "wide" format.

Usage

reshape(data, varying = NULL, v.names = NULL, timevar = "time",
    idvar = "id", ids = NULL, times = NULL, drop = NULL,
    direction, new.row.names = NULL, sep = ".", 
    split = if (sep == "") {
        list(regexp = "[A-Za-z][0-9]", include = TRUE)
    } else {
        list(regexp = sep, include = FALSE, fixed = TRUE)
    })

Arguments

data	a data frame with repeated measurements. Each "individual" will have measurements on various aspects of it taken at a number of "times". In "long" format there will be a column of times, a column of ids, and a one column per measurement type. It is expected that each individual will be measured at the same set of times. In "wide" format there will be a column of identifiers (with no repeated entries) and a column for each measurement at each time (and no time column). If data has the attribute reshapeWide or reshapeLong then the components of the attribute will be used as arguments to reshape: all other arguments are ignored.
varying	a list of equal-length vectors of variable names in wide format: each vector of variable names in the list corresponds to a single variable in the long format. I.e., each component of the list contains the names of the columns referring to a single measurement type taken at different times. It can also be be a matrix of variable names, where each row of the matrix acts like a component of the list decribed above. It can also be a vector of variable names, in which case reshape will try to intuit their meanings, assuming that the names are pasted together from the column names in v.name and time values from the timevar column and the sep string.
v.names	a character vector of variable names in long format that correspond to multiple variables in the wide format. If not supplied, all columns in the data argument except those named by the idvar, times, and drop arguments will be used.
timevar	a character string represented as time variable in long format that identifies multiple records from the same group/individual.
idvar	a character vector of names of one or more variables in long format that identify multiple records from the same group/individual. This argument may also be given in wide format.
ids	the values to be used in the idvar variables in long format.
times	the values to be used in the timevar variable in long format.
drop	a character vector of names of variables to be dropped before the data is reshaped.
direction	a character string specifies the reshape way. "wide" means to reshape to wide format, and "long" means to reshape to long format. This argument must be presented unless to reshape a data with attribute "reshapeLong" or "reshapeWide".
new.row.names	a character vector of row names to be replaced as new row names of the reshaped data. If NULL, the row names are created from the values of idvar and timevar variables in long format.
sep	a character string used as the separator of the variable name and time point parts of the measurement columns in wide format. When converting to wide format, sep is used to generate the new column names; when converting to long format and v.names is a single character string, sep is used, via split, to find the measurement name and time point value encoded in the column names.
split	A list with three components regexp, a regular expression, include, a logical, and fixed, a logical. This is used when converting to long format to decode the variable names given by varying. regexpr is a either a regular expression pattern (see regexpr) identifying where to split up the names (if fixed is TRUE) or a fixed string identifying where to split up the names (if fixed is FALSE). If include is TRUE then the first character of the matched text is included in the first part of the the split name, otherwise it is not (this is useful when there is no separator character so, e.g., the pattern "Joe10" may be split into "Joe" and "10" with the regular expression "[[:alpha:]][[:digit:]]".)

Details

In connection with analysis of repeated measurements, data are often organized either in a wide format or a long format. The reshape() function can change data from one format to the other.

A reshaped data frame can be reversed simply by reshape(data) since most of arguments are stored in attribute of the data frame.

Value

a reshaped data frame with attribute "reshapeLong"(to long format) or "reshapeWide"(to wide format).

The attribute "reshapeLong" is a list with components varying, v.names, idvar and timevar, which stores the information to reshape the data to long format.

The attribute "reshapeWide" is a list with components v.names, timevar, idvar, times and varying, which stores the information to reshape the data to wide format.

See Also

strsplit, regexpr, unlist

Examples

# "wide" format data: measured 'conc' on days 1, 3, and 5 for 2 animals
d1 <- data.frame(animal = c("Dog", "Cat"),
                 conc1 = c(10.1, 1.1),
                 conc3 = c(30.3, 3.3),
                 conc5 = c(50.5, 5.5),
                 treatment = c("t1", "t2"))
d1.long <- reshape(d1, direction = "long",
            varying = list(c("conc1", "conc3", "conc5")),
            v.names= "conc",
            idvar = "animal")
d1.long
# same result as above, but let it intuit the v.names and time values
reshape(d1, direction = "long",
            varying = c("conc1", "conc3", "conc5"),
            sep="",
            idvar = "animal")
# convert it back to original shape
reshape(d1.long)
# "long" format: one row for each parent
d2 <- data.frame(
    child=c("Alan","Alan","Susan","Susan"),
    childAge=c(2, 2, 10, 10),
    parent=c("Betty", "Chris", "Ulam", "Tammy"),
    parentSex=c("Female", "Male", "Male", "Female"),
    parentAge=c(26, 28, 44, 42))
# reshape to have one row per child, with columns for each parent
reshape(d2, direction = "wide", idvar = "child",
    timevar="parentSex",
    v.names=c("parent", "parentAge"))

Package stats version 6.0.0-69
Package Index