split
Split Data by Groups

Description

Returns a list in which each component is a data frame or a vector containing the values from x that correspond to unique values of f.

Usage

split(x, f, drop = FALSE, ...)
unsplit(value, f, drop = FALSE)
## Default S3 method:
split(x, f, drop = FALSE, sep = ".", lex.order = FALSE, ...)
## S3 method for class 'data.frame':
split(x, f, drop = FALSE, ...)
## S3 method for class 'Date':
split(x, f, drop = FALSE, ...)
## S3 method for class 'POSIXct':
split(x, f, drop = FALSE, ...)

split(x, f, drop = FALSE, ...) <- value ## Default S3 method: split(x, f, drop = FALSE, ...) <- value ## S3 method for class 'data.frame': split(x, f, drop = FALSE, ...) <- value

Arguments

x a vector or data frame containing the values to be grouped. If x is a vector, its values are grouped according to the f argument. If x is a data frame, its rows are grouped. Missing values (NAs) are allowed.
f a vector, a factor variable, or a list of such variables giving the groups for the data values.
  • If x is a data frame, the f variable is replicated until it has the same number of elements as there are rows in x.
  • If f is a list, its elements are passed to the interaction function, along with the sep, drop, and lex.order arguments, to produce a factor combining the information in the elements.
  • If f is longer than x, a warning is issued and some of the components in the result have zero length.
  • If x is longer than f, the contents of f are replicated to be the same length. It generates a warning if the length of x is not an even multiple of the factor length.
drop a logical value. If TRUE, unused factor levels in f are dropped before splitting the data, to avoid zero-length output elements that they would cause.
sep a character string passed along to the interaction function when f is a list, used to separate the levels of the factors in f in the levels of the factor generated by interaction. It should include characters that are not in any of the factor levels in f.
lex.order a logical value passed along to the interaction function when f is a list that controls the order of the factor levels in combined factors. If FALSE, the levels of the first factor vary the fastest, otherwise the levels of the last factor vary the fastest.
value a list of vectors or data frames compatible with a splitting of x with f.
... Extra arguments to pass to the S3 methods for split or split<-. If these arguments are not used by the particular method, it generates an error.

Details

If f is not a factor variable, split, split<-, and unsplit convert it to one before grouping, and then assign the levels of the factor to sort(unique(f)). If you want a different order for the levels, convert the f vector to a factor and define the levels explicitly before passing it to split.
split<- assigns the elements of value to the appropriate positions within x, according to the groups defined by f and drop. If the length of value is not the same as the number of groups, the elements of value are repeated cyclically. If the length of an element of value is less than the number of elements in the corresponding group, then the list element is repeated cyclically. If each list element is assigned to a group, the names names(value) are ignored. Only the order of value matters. Normally value is a list. If it is a simple vector, it is treated like a list with one element per element.
split<- and unsplit are normally used to process objects produced by an earlier split operation, where f is the same value used to split an object originally, and value is a transformed version of the result of splitting the object. In this case, split<- and unsplit rearrange the transformed data to its original order.
Value
split returns a list in which each component contains all x values associated with a particular value of f. For example, if the third value in f is 12, the third value (or row) in x is placed in a list component with all other x values that have f values of 12.

Within each group, data values are ordered as they originally appeared in x. The names of the list components in the output are the corresponding group values if f is a numeric vector, or the corresponding levels if f is a factor variable.

split<- returns the modified variable. Any attributes from the original x variable, such as the dim attribute of a matrix, are maintained.
unsplit returns a vector or data frame x for which split(x, f) equals value.
See Also
factor, sapply, tapply, interaction, Date, POSIXt
Examples
split(c("Martin", "Mary", "Matt"), c("Male", "Female", "Male")) 
x<-matrix(1:10, ncol=2)
split(x, col(x))
split(1:6, rep(factor(c('a','b'),levels=c('a','b','c')),3))
split(1:6, rep(factor(c('a','b'),levels=c('a','b','c')),3),drop=TRUE)
split(1:8, 1:2)

x <- 1:10 split(x, 1:2) <- 99:100 x

x <- matrix(1:10, ncol=2, dimnames=list(LETTERS[1:5],LETTERS[6:7])) split(x, col(x)<row(x)) <- c(0,1) x

x<-matrix(1:10, ncol=2) y<-split(x, col(x)) unsplit(y, col(x))

da = data.frame(col1=c("a","b","c","d","e","f"),col2=c(1,2,3,4,5,6)) split(da,factor(c("OddRows","EvenRows"), levels=c("Odd","Even")))

# replace each element in the group with the mean within that group

x <- c(2, 3, 5, 7, 11, 13, 17) grp <- x < 10 split(x,grp) <- lapply(split(x, grp), mean) x

df <- data.frame(a=1:10,b=101:110) sp <- split(df,1:2) sp[[2]]$a <- 1001:1005 # replace column 'a' of one group split(df,1:2) <- sp # and restore the original data.frame df

## Date dt <- as.Date("1970-01-01") + 1:4 split(x = dt, f = c(1, 3))

## POSIXct pct <- as.POSIXct("1970-01-01 00:00:00") + 1:4 split(x = pct, f = c(1, 3))

## split by several factors str(split(11:20, f=list(rep(c("One","Two"),c(4,6)), rep(c("A","B","C"), c(4,4,2))))) str(split(11:20, f=list(rep(c("One","Two"),c(4,6)), rep(c("A","B","C"), c(4,4,2))), drop = TRUE)) str(split(11:20, f=list(rep(c("One","Two"),c(4,6)), rep(c("A","B","C"), c(4,4,2))), lex.order = TRUE)) str(split(11:20, f=list(rep(c("One","Two"),c(4,6)), rep(c("A","B","C"), c(4,4,2))), sep="/"))

Package base version 6.1.4-13
Package Index