tapply(X, INDEX, FUN = NULL, ..., simplify = TRUE)
X | a vector of data to be grouped by index. Missing values (NAs) are allowed if FUN accepts them. |
INDEX |
a list whose components are interpreted as factors, each the same length
as X.
The elements of the indices define a cell in a multi-way array corresponding to each X observation. Missing values (NAs) are allowed. The names in INDEX are used as the names of the dimnames of the result. If a vector is given, it is treated as a list with one component. |
FUN |
a function, a character string, or a symbol giving the name of the
function to apply to each cell.
If FUN is omitted, tapply returns a vector that can be used to subscript the multi-way array that tapply normally produces. This vector is useful for computing residuals. |
... | optional arguments to be given to each invocation of FUN. |
simplify | a logical value. If FALSE, then tapply always returns an array of mode list. If TRUE (the default), and if FUN returns a numeric value or single value, then tapply returns an array with the mode of the same type. See Value for more information. (simplify is ignored if FUN is not supplied.) |
If FUN is missing | returns a vector of indices. These indices give the position for each element of X in the array that would be returned if FUN were not missing. |
If FUN is present | calls FUN for each cell that has any data in it. |
If FUN returns a single atomic value for each cell (for example, functions mean or var) | returns a multi-way array containing the values. The array has the same number of dimensions as INDEX has components. The number of levels in a dimension matches the number of levels in the corresponding component of INDEX. If INDEX has only one component, this is a one-dimensional array. |
If FUN does not return a single atomic value | returns an array of mode "list" whose components are the values of the individual calls to FUN. In other words, the result is a list that has a dim attribute. (This prints as a list, but you can subscript it as you would an array.) |
x <- as.factor(c('a', 'b', 'c', 'a')) tapply(x, x, length) # counts elements, similar to table # returns 1-D integer array: # a b c # 2 1 1 tapply(x, x) # returns integer vector of positions in array # [1] 1 2 3 1# data with NAs x <- 1:40 x[c(5, 30)] <- NA ind <- rep(c('A', 'B', 'C'), length = 40)
tapply(x, ind, mean) # call mean on cells # A B C # 20.5 NA NA
tapply(x, ind, mean, na.rm = TRUE) # pass extra argument na.rm=TRUE argument to mean(), # so mean will ignore NA values # A B C # 20.50 21.25 20.25
tapply(x, ind, mean, na.rm = TRUE, simplify = FALSE) # get same result as 1-D array of mode 'list' # $A # [1] 20.5 # # $B # [1] 21.25 # # $C # [1] 20.25
ind1 <- rep(c('A', 'B' ,'C', 'D'), length = 40) tapply(x, list(ind, ind1), mean) # A B C D # A 19 22 19 22 # B NA 20 23 20 # C 21 NA 21 24
ind2 <- rep(c('A','B','C','D', 'E'), length = 40)
# Returns array with dim 3x4x5 tapply(x, list(ind, ind1, ind2), mean, na.rm = TRUE)