by
Split a Data Frame and Apply a Function to the Parts
Description
Splits the rows of a data frame or elements of a vector according values of
a grouping index,
and then calls the specified function on each group of rows or elements.
Usage
by(data, INDICES, FUN, ..., simplify = TRUE)
by.data.frame(data, INDICES, FUN, ..., simplify = TRUE)
Arguments
data |
an object, normally a data frame.
Object data is converted to a data.frame by default.
|
INDICES |
a factor or a list of several factors.
The length of each factor should be the same as the number of rows of
data.
Missing values (NAs) are allowed.
The rows or elements of data are split into groups
based on the unique combinations of the levels of INDICES,
and FUN is computed on each data.frame formed by the groups.
The names of INDICES are used as the names of the dimnames of the
result. If INDICES is not a factor or a list of factors,
it is coerced to factors internally.
|
FUN |
a function whose first argument is a data frame.
FUN is called once on each group formed from data
based on the unique combinations of the levels of INDICES.
|
... |
all other arguments are passed to FUN each time it is called.
|
simplify |
a logical flag. It is used by an internal call to the function
tapply. If FALSE, tapply always returns an array
of mode list. If TRUE (the default),
and if FUN returns a numeric value or single value,
then tapply returns an array with the mode of the same type.
(simplify is ignored if FUN is not supplied.)
|
Details
by.data.frame takes a data frame and a list of indices,
each of which should have one entry for each row (observation)
in the data frame.
For each unique combination of values in the factors, it extracts
the rows in the data frame whose corresponding indices have that
combination of values, and then calls the function FUN on
those rows of the data frame as its argument.
The by() function is a convenient,
object oriented version of tapply().
Value
returns an object of class by.
This class consists of an array of mode list with one dimension
for each index in INDICES,
the dimension being the number of levels in that index.
The dimnames of the object give the levels of the indices,
and the names of the dimnames give the names of the indices.
If the list given as INDICES has no names,
then by tries to create reasonable names.
If no observations correspond to some elements of the array,
those elements have the value NULL
(FUN is not called for those empty cells).
This object is intended to be printed by print.by,
the print method of objects of class by.
For each cell in the array, it prints the value of each index, and
then prints the value of the cell.
It prints a separator line (by default, a series of dashes) between
the cells.
See Also
Examples
by(Sdatasets::Puromycin[,1:2], Sdatasets::Puromycin$state, summary)
by(Sdatasets::Puromycin[,2], Sdatasets::Puromycin$state, colMeans)
by(Sdatasets::fuel.frame[,1:3], Sdatasets::fuel.frame$Type, summary)
by(Sdatasets::fuel.frame[,1:3], Sdatasets::fuel.frame$Type, cor)