tapply
Apply a Function to a Ragged Array

Description

Partitions a vector according to one or more indices. Each index is a vector of logical or factor values the same length as the data vector. (To use more than one index, create a list of index vectors.)

Usage

tapply(X, INDEX, FUN = NULL, ..., default = NA, simplify = TRUE)

Arguments

X a vector of data to be grouped by index. Missing values (NAs) are allowed if FUN accepts them.
INDEX a list whose components are interpreted as factors, each the same length as X.

The elements of the indices define a cell in a multi-way array corresponding to each X observation. Missing values (NAs) are allowed. The names in INDEX are used as the names of the dimnames of the result. If a vector is given, it is treated as a list with one component.

FUN a function, a character string, or a symbol giving the name of the function to apply to each cell.

If FUN is omitted or NULL, tapply returns a vector that can be used to subscript the multi-way array that tapply normally produces. This vector is useful for computing residuals.

... optional arguments to be given to each invocation of FUN.
default a single value. If FUN is a function that returns a scalar value and simplify is TRUE then array entries in the return value that have no data behind them will be given the value default. I.e., this should be the result of FUN(X[0]).
simplify a logical value. If FALSE, then tapply always returns an array of mode list. If TRUE (the default), and if FUN returns a numeric value or single value, then tapply returns an array with the mode of the same type. See Value for more information. (simplify is ignored if FUN is not supplied.)

Details

Evaluates a function, FUN, on data values that correspond to each cell of a multi-way array.
Other useful functions are:
Value
If FUN is missingreturns a vector of indices. These indices give the position for each element of X in the array that would be returned if FUN were not missing.
If FUN is presentcalls FUN for each cell that has any data in it.
If FUN returns a single atomic value for each cell (for example, functions mean or var)returns a multi-way array containing the values. The array has the same number of dimensions as INDEX has components. The number of levels in a dimension matches the number of levels in the corresponding component of INDEX. If INDEX has only one component, this is a one-dimensional array.
If FUN does not return a single atomic valuereturns an array of mode "list" whose components are the values of the individual calls to FUN. In other words, the result is a list that has a dim attribute. (This prints as a list, but you can subscript it as you would an array.)
See Also
by, table, loglin, apply, lapply sapply
Examples
x <- as.factor(c('a', 'b', 'c', 'a'))
tapply(x, x, length)
# counts elements, similar to table
# returns 1-D integer array:
#   a b c
#   2 1 1
tapply(x, x)
# returns integer vector of positions in array
#  [1] 1 2 3 1

# data with NAs x <- 1:40 x[c(5, 30)] <- NA ind <- rep(c('A', 'B', 'C'), length = 40)

tapply(x, ind, mean) # call mean on cells # A B C # 20.5 NA NA

tapply(x, ind, mean, na.rm = TRUE) # pass extra argument na.rm=TRUE argument to mean(), # so mean will ignore NA values # A B C # 20.50 21.25 20.25

tapply(x, ind, mean, na.rm = TRUE, simplify = FALSE) # get same result as 1-D array of mode 'list' # $A # [1] 20.5 # # $B # [1] 21.25 # # $C # [1] 20.25

ind1 <- rep(c('A', 'B' ,'C', 'D'), length = 40) tapply(x, list(ind, ind1), mean) # A B C D # A 19 22 19 22 # B NA 20 23 20 # C 21 NA 21 24

ind2 <- rep(c('A','B','C','D', 'E'), length = 40)

# Returns array with dim 3x4x5 tapply(x, list(ind, ind1, ind2), mean, na.rm = TRUE)

Package base version 6.1.1-7
Package Index