tapply(X, INDEX, FUN = NULL, ..., simplify = TRUE)
| X | a vector of data to be grouped by index. Missing values (NAs) are allowed if FUN accepts them. |
| INDEX |
a list whose components are interpreted as factors, each the same length
as X.
The elements of the indices define a cell in a multi-way array corresponding to each X observation. Missing values (NAs) are allowed. The names in INDEX are used as the names of the dimnames of the result. If a vector is given, it is treated as a list with one component. |
| FUN |
a function, a character string, or a symbol giving the name of the
function to apply to each cell.
If FUN is omitted, tapply returns a vector that can be used to subscript the multi-way array that tapply normally produces. This vector is useful for computing residuals. |
| ... | optional arguments to be given to each invocation of FUN. |
| simplify | a logical value. If FALSE, then tapply always returns an array of mode list. If TRUE (the default), and if FUN returns a numeric value or single value, then tapply returns an array with the mode of the same type. See Value for more information. (simplify is ignored if FUN is not supplied.) |
| If FUN is missing | returns a vector of indices. These indices give the position for each element of X in the array that would be returned if FUN were not missing. |
| If FUN is present | calls FUN for each cell that has any data in it. |
| If FUN returns a single atomic value for each cell (for example, functions mean or var) | returns a multi-way array containing the values. The array has the same number of dimensions as INDEX has components. The number of levels in a dimension matches the number of levels in the corresponding component of INDEX. If INDEX has only one component, this is a one-dimensional array. |
| If FUN does not return a single atomic value | returns an array of mode "list" whose components are the values of the individual calls to FUN. In other words, the result is a list that has a dim attribute. (This prints as a list, but you can subscript it as you would an array.) |
x <- as.factor(c('a', 'b', 'c', 'a'))
tapply(x, x, length)
# counts elements, similar to table
# returns 1-D integer array:
# a b c
# 2 1 1
tapply(x, x)
# returns integer vector of positions in array
# [1] 1 2 3 1
# data with NAs
x <- 1:40
x[c(5, 30)] <- NA
ind <- rep(c('A', 'B', 'C'), length = 40)
tapply(x, ind, mean)
# call mean on cells
# A B C
# 20.5 NA NA
tapply(x, ind, mean, na.rm = TRUE)
# pass extra argument na.rm=TRUE argument to mean(),
# so mean will ignore NA values
# A B C
# 20.50 21.25 20.25
tapply(x, ind, mean, na.rm = TRUE, simplify = FALSE)
# get same result as 1-D array of mode 'list'
# $A
# [1] 20.5
#
# $B
# [1] 21.25
#
# $C
# [1] 20.25
ind1 <- rep(c('A', 'B' ,'C', 'D'), length = 40)
tapply(x, list(ind, ind1), mean)
# A B C D
# A 19 22 19 22
# B NA 20 23 20
# C 21 NA 21 24
ind2 <- rep(c('A','B','C','D', 'E'), length = 40)
# Returns array with dim 3x4x5
tapply(x, list(ind, ind1, ind2), mean, na.rm = TRUE)