by
Split a Data Frame and Apply a Function to the Parts

Description

Splits the rows of a data frame or elements of a vector according values of a grouping index, and then calls the specified function on each group of rows or elements.

Usage

by(data, INDICES, FUN, ..., simplify = TRUE)
by.data.frame(data, INDICES, FUN, ..., simplify = TRUE)

Arguments

data an object, normally a data frame. Object data is converted to a data.frame by default.
INDICES a factor or a list of several factors. The length of each factor should be the same as the number of rows of data. Missing values (NAs) are allowed. The rows or elements of data are split into groups based on the unique combinations of the levels of INDICES, and FUN is computed on each data.frame formed by the groups. The names of INDICES are used as the names of the dimnames of the result. If INDICES is not a factor or a list of factors, it is coerced to factors internally.
FUN a function whose first argument is a data frame. FUN is called once on each group formed from data based on the unique combinations of the levels of INDICES.
... all other arguments are passed to FUN each time it is called.
simplify a logical flag. It is used by an internal call to the function tapply. If FALSE, tapply always returns an array of mode list. If TRUE (the default), and if FUN returns a numeric value or single value, then tapply returns an array with the mode of the same type. (simplify is ignored if FUN is not supplied.)

Details

by.data.frame takes a data frame and a list of indices, each of which should have one entry for each row (observation) in the data frame. For each unique combination of values in the factors, it extracts the rows in the data frame whose corresponding indices have that combination of values, and then calls the function FUN on those rows of the data frame as its argument.
The by() function is a convenient, object oriented version of tapply().
Value
returns an object of class by. This class consists of an array of mode list with one dimension for each index in INDICES, the dimension being the number of levels in that index. The dimnames of the object give the levels of the indices, and the names of the dimnames give the names of the indices. If the list given as INDICES has no names, then by tries to create reasonable names. If no observations correspond to some elements of the array, those elements have the value NULL (FUN is not called for those empty cells).
This object is intended to be printed by print.by, the print method of objects of class by. For each cell in the array, it prints the value of each index, and then prints the value of the cell. It prints a separator line (by default, a series of dashes) between the cells.
See Also
lapply, sapply, tapply.
Examples
by(Sdatasets::Puromycin[,1:2], Sdatasets::Puromycin$state, summary)
by(Sdatasets::Puromycin[,2], Sdatasets::Puromycin$state, colMeans)

by(Sdatasets::fuel.frame[,1:3], Sdatasets::fuel.frame$Type, summary) by(Sdatasets::fuel.frame[,1:3], Sdatasets::fuel.frame$Type, cor)

Package base version 6.1.1-7
Package Index