tapply(X, INDEX, FUN = NULL, ..., default = NA, simplify = TRUE)
X | a vector of data to be grouped by index. Missing values (NAs) are allowed if FUN accepts them. |
INDEX |
a list whose components are interpreted as factors, each the same length
as X.
The elements of the indices define a cell in a multi-way array corresponding to each X observation. Missing values (NAs) are allowed. The names in INDEX are used as the names of the dimnames of the result. If a vector is given, it is treated as a list with one component. |
FUN |
a function, a character string, or a symbol giving the name of the
function to apply to each cell.
If FUN is omitted or NULL, tapply returns a vector that can be used to subscript the multi-way array that tapply normally produces. This vector is useful for computing residuals. |
... | optional arguments to be given to each invocation of FUN. |
default | a single value. If FUN is a function that returns a scalar value and simplify is TRUE then array entries in the return value that have no data behind them will be given the value default. I.e., this should be the result of FUN(X[0]). |
simplify | a logical value. If FALSE, then tapply always returns an array of mode list. If TRUE (the default), and if FUN returns a numeric value or single value, then tapply returns an array with the mode of the same type. See Value for more information. (simplify is ignored if FUN is not supplied.) |
If FUN is missing | returns a vector of indices. These indices give the position for each element of X in the array that would be returned if FUN were not missing. |
If FUN is present | calls FUN for each cell that has any data in it. |
If FUN returns a single atomic value for each cell (for example, functions mean or var) | returns a multi-way array containing the values. The array has the same number of dimensions as INDEX has components. The number of levels in a dimension matches the number of levels in the corresponding component of INDEX. If INDEX has only one component, this is a one-dimensional array. |
If FUN does not return a single atomic value | returns an array of mode "list" whose components are the values of the individual calls to FUN. In other words, the result is a list that has a dim attribute. (This prints as a list, but you can subscript it as you would an array.) |
x <- as.factor(c('a', 'b', 'c', 'a')) tapply(x, x, length) # counts elements, similar to table # returns 1-D integer array: # a b c # 2 1 1 tapply(x, x) # returns integer vector of positions in array # [1] 1 2 3 1# data with NAs x <- 1:40 x[c(5, 30)] <- NA ind <- rep(c('A', 'B', 'C'), length = 40)
tapply(x, ind, mean) # call mean on cells # A B C # 20.5 NA NA
tapply(x, ind, mean, na.rm = TRUE) # pass extra argument na.rm=TRUE argument to mean(), # so mean will ignore NA values # A B C # 20.50 21.25 20.25
tapply(x, ind, mean, na.rm = TRUE, simplify = FALSE) # get same result as 1-D array of mode 'list' # $A # [1] 20.5 # # $B # [1] 21.25 # # $C # [1] 20.25
ind1 <- rep(c('A', 'B' ,'C', 'D'), length = 40) tapply(x, list(ind, ind1), mean) # A B C D # A 19 22 19 22 # B NA 20 23 20 # C 21 NA 21 24
ind2 <- rep(c('A','B','C','D', 'E'), length = 40)
# Returns array with dim 3x4x5 tapply(x, list(ind, ind1, ind2), mean, na.rm = TRUE)