Subscript
Extract or Replace Parts of an Object
Description
Extracts or replaces parts of an object. Can apply to a vector, a list, an
array, a factor, or a data frame.
These functions are generic operators. (See help for
Methods
for details.) Method functions can be written to handle specific
S Version 3 classes of data. Classes that already have methods for
these functions include:
- "anova"
- "data.frame"
- "factor"
- "model.matrix"
- "smooth"
- "tree"
Usage
x[i]
x[i, j, ...]
x[i, j, ..., drop = TRUE]
x[[..., exact = TRUE]]
x[i, drop=FALSE]
x[i, j, drop=<<see below>>]
x[<args>] <- value
x[[i]]
x[[i, j, ..., exact = TRUE]]
x[[<args>]] <- value
x[i, j] <- value
x[[i, j]] <- value
x[i] <- value
x$name
x$name <- value
Arguments
x |
an object. See Details for more information.
|
i, j, ... |
subscript expressions used to identify the elements to extract or
replace. The expressions can be empty, which corresponds to all
possible subscripts and therefore extracts/replaces the entire object
(or the entire dimension of the object). The expressions may also be
logical, numeric, or character. Numeric subscripts should be integers,
such as the output from : (the sequence operator).
- If x is a data frame:
- If a single argument is given that is not a matrix or list
(x[j] or
x[[j]]), then
x is treated as a frame or list and the
index j is assumed to index the variables
of the data frame.
- If the subscripting appears to be a matrix (that is,
x[i,j], x[i,], or x[,j],
then i and j apply to the rows (observations) and columns (variables), respectively. The methods treat x as a matrix.
- If x is a factor, then i is the index for extraction or
replacement (either a numeric or logical). For the [[ form, the
index must be a single integer value.
|
name |
a name or quoted string identifying the component to extract or
replace. This is used when subscripting objects such as lists and data
frames.
|
drop |
a logical flag. The class of the result might differ from the class of the
original object, particularly when drop=TRUE.
- For data frame, when drop is not supplied, subscripting a
single row returns a data frame (like drop=FALSE)
but subscripting a single column returns a vector. Explicitly
setting drop=TRUE and subscripting
one row returns a list. Explicitly setting
drop=FALSE and subscripting one column
returns a data frame.
- For a factor, if TRUE, the unused levels of the factor
are dropped. The default is FALSE.
- For an ordinary matrix or array, if
drop=TRUE, dimensions
of length 0 or 1 are dropped from the return object.
For example, assume you have a 5 by 10 matrix M. The
expression M[,1] produces a
vector of length 5 and M[,1,drop=FALSE]
produces a 5 by 1 matrix. Setting drop=FALSE
is most useful in functions where consistency is important; the
expression M[,j,drop=FALSE] always produces a
two-dimensional matrix regardless of the length of
j.
|
exact |
a logical flag. If TRUE (the default), character indices match
names or dimnames of the object. When exact=FALSE, partial
matching is used.
|
ignore.row.names |
a logical flag intended to be used when extracting the same row more than once from a data.frame
to make a new data.frame (for example, when doing resampling).
A problem arises because the row names of a data.frame must be unique.
If FALSE (the default), the row names for the ouput data.frame are based
on those of the input, with digits appended to them as needed to make
them unique (using the make.unique function).
If TRUE, the row names for the output are the first nrow(output)
whole numbers, converted to character strings; they are not based on the input row names.
Using ignore.row.names=TRUE can save time when doing resampling
on large data.frames.
|
value |
the replacement value for the relevant piece of the object.
- For a data frame, unless value is a constant, we recommend that
this argument be a data frame if you want to replace data in more than
one variable. Also, the replacement value can be an atomic vector or a
list. For double subscripts, value should not be a list or data frame.
- For a factor, value can, but need not, be a factor. In any
case, the values are interpreted in terms of the level set of the factor.
Values not in the level set will generate NAs.
|
Details
All zero indices are dropped before subscripting.
If
x is a
data frame, the following operators apply:
- x[i, j, drop=<<see drop description>>]
- x[i, j] <- value
- x[[..., exact = TRUE]]
- x[[i, j]] <- value
If
x is a
factor, the following operators apply:
- x[i]
- x[i, drop=FALSE]
- x[i] <- value
- x[[i]]
If
x is a vector, list, or array, the following operators apply:
- x[i]
- x[i, j, ...]
- x[i, j, ..., drop = TRUE]
- x[<args>] <- value
- x[[i]]
- x[[i, j, ..., exact = TRUE]]
- x[[<args>]] <- value
- x$name
- x$name <- value
If
x is a factor, the
[[ form selects a single element
from the factor object, and the result is a factor object containing
a single element from
x with the same levels attribute as
x.
If x is a vector, and if i is empty, all of x is
extracted or replaced without affecting the attributes of x.
For extraction, the value of x[i] has the same mode as x
and the same length as the number of indices. The elements of x[i]
are the elements of x corresponding to the indices, except if
the indices are greater than length(x) or are NA. In
either exception, the returned elements are missing; that is, they are
NA for an atomic mode and NULL for a non-atomic mode.
All attributes of x are discarded in the
subset except the names attribute. The
names attribute of
x[i] is equal to
names(x)[i].
The expression x\$name returns the
name component of
x. It is equivalent to
x[["name"]] if
x is recursive and an error otherwise.
Partial matching is
performed on the names of x for
extractions. Thus to extract a component of a list, you only need to
give enough of the name to make it unique. Replacement of the
name component may coerce an object to a
list.
For replacement, the length of x is set
to the largest value in the indices, if that is bigger than the
current length of x. If x and value
do not have the same mode, then one of them is coerced to the common
mode that can represent both without loss of information. This may
mean that replacing elements of an object will change its mode.
Arguments are passed to the subscript operators by position, not
by name. The exception to this rule is drop. You must pass this
argument by name. That is, x[2, , drop = TRUE] successfully extracts
the second row of x and results in list. x[2, , TRUE] results
in an error.
The drop argument is ignored if only a single subscript is given. For example,
x[j, drop=TRUE] returns a data frame, not a vector.
When you replace one or more columns of a data frame, the engine checks
that the new values have the right number of rows and, if possible, ensures
that the assignment leaves the data frame in a valid
state. With
x[i,j] <- value, the
characteristics of variables (that is, class, mode, dimension, attributes) in
the result depend on the characteristics of the variables in both
x and
value. For the following, they depend solely on
value:
- x[[j]] <- value
- x[j] <- value
- x[,j] <- value
For data frames containing matrices, these operators handle column
subscripts as if each matrix were a single column. For example, if
Y has 3 elements where the third is a
matrix with 5 columns,
Y[,3] is dropped
to a matrix with 5 columns and
Y[,4] is
undefined. With
x[i,j] <- value, if
x contains matrices, then both of the
following are true:
- The i subscript must refer solely to
existing rows. In other cases, the dimensions of the data frame are
expanded as necessary.
- Elements of value are assigned to
columns of x without regard to the
dimensions of matrices in x. We recommend that either
value be a data frame with matrices having the same dimensions as those in
x[i,j], or that you use double subscripting for one variable at a time.
For example, Y[[3]][i,] <- 2*Y[[3]][i,].
There is a special case for
x[j], where
j is a matrix, and is one of the following:
- A logical with the same dimensions as x.
- A numeric with two columns where the first column refers to rows and the second to column numbers.
In either of these situations, all columns of
x are coerced to a common data type and
matrix variables are converted to multiple columns before subscripting
with the matrix
j.
Matrix subscripts are not allowed for the replacement operation x[j] <- value.
Attributes of x other than row.names and dup.row.names
are lost if columns are subscripted (for example, x[j],
x[,j], x[i,j]).
Value
- If x is a vector, a list, an array, the extraction functions
[, [[, and \$ return the designated elements or
properties of the object x.
- If x is a data frame, the extraction functions [ and
[[ return the data formed by the designated elements of the
data frame x. When you extract a single variable (that is,
x[i,] or x[,j]), the returned data can be dropped from
a data frame to a list or variable. The expression x[[j]]
returns a single variable unless j is a matrix.
(See Details for more information.)
- If x is a factor, all forms return the factor formed by extracting or
replacing the relevant subset.
Side Effects
The replacement functions [<-
and [[<- replace the designated
elements or properties of the object x.
Background
- The subscript operator [ is
designed to subscript vectors, matrices, and arrays using integers,
logical values, or character values as subscripts.
- The list subscript operator [[ is designed to
subscript lists.
- The component operator is designed to
subscript list-like objects with named components, such as data
frames. For detailed information on subscripting data frames,
see [.data.frame. Any expression that evaluates to an appropriate
subscript value can be included in the square brackets.
- The vector subscript corresponds to an element's position or
index in the vector. For example, the sixth element in a vector
x has a subscript (or index) of 6. You
can subscript a data vector by providing a set of indices that
correspond to the elements you wish to keep. If
y is a vector of indices,
x[y] returns the elements in
x that correspond to the indices.
Vectors with Integers
If you supply a set of positive integers to subscript a vector, they
are interpreted as the indices of the elements you want to
keep. The indices do not need to be unique nor do they need to be
given in increasing order. If the requested index for a vector
x is greater than
length(x),
NA is returned to indicate a missing value.
If you supply a set of negative integers to subscript a vector,
they are interpreted as the indices of the elements you want to exclude
from the result. You cannot combine positive and negative integers to
subscript a vector.
Vectors with Logical Values
If you supply a set of logical values to subscript a vector,
the TRUE values are interpreted as the indices
of the elements you want to keep. Logical index vectors are generally
the same length as the vectors to be subscripted. However, this is not
a strict requirement, as the values in a short logical
vector are recycled so that its length matches a longer vector. No values are
returned for indices greater than length(x).
Vectors with Character Values
When you supply a set of character values to subscript a vector, the
values must be from the vector's names
attribute. Thus, this subscripting technique requires the vector to
have a non-null names attribute. You can
use the names<- replacement function to
assign names if necessary.
Matrices and Arrays
Subscripting data sets that are matrices or arrays is very similar to
subscripting vectors. In fact, you can subscript them exactly like
vectors if you keep in mind that arrays are stored in
column-major
order. You can think of the data values in an array as being
stored in one long vector that has a
dim
attribute to specify the array's shape. Column-major order states that
the data values fill the array so that the first index changes the
fastest and the last index changes the slowest. For matrices, this
means the data values are filled in column-by-column. When a matrix is
subscripted in this way, the element returned is a single number
without dimension attributes, so that is is no longer recognized as a
matrix.
S also lets you use the structure of arrays to your advantage by
allowing you to specify one subscript for each dimension. Since
matrices have two dimensions, you can specify two subscripts inside
the square brackets. The matrix subscripts correspond to the row and
column indices, respectively.
As with vectors, array subscripts can be positive integers,
negative integers, logical vectors, or character vectors if
appropriate. To subscript matrices and arrays by supplying character
subscripts, the supplied values must be from the array's
dimnames attribute. If the subscript for
a given dimension is omitted, all subscripts are assumed.
Arrays with Matrices
You can extract irregular subsets of arrays by supplying a subscript
matrix representing the positions of the individual elements you wish
to keep. For example, suppose
M is a 3 by
4 matrix and we want to extract two elements from it: the element in
row 1 and column 2, and the element in row 3 and column 3. We can do
this directly with the command
c(M[1,2],
M[3,3]). More generally, we can do this by subscripting
with a matrix:
subscr.mat <- matrix(c(1,2,3,3), ncol=2, byrow=T);
M[subscr.mat]
Warnings
Partial matching of abbreviated dimension names sometimes gives
undesired results. To avoid partial matching, you can use the
match function. For example, to prevent
x["1"] from inadvertently returning
x["10"] when there is no
"1" in the
names attribute, use
x[match("1", names(x))].
To replace a component of a list, you must give the entire name of the
component. If you abbreviate the name, a new component with the
abbreviated name is added.
Subscripting coerces non-integer numeric subscripts to integers using
as.integer. Because
as.integer creates integers by truncating
the numeric representation, this coercion can lead to unexpected
results.
See Also
Examples
# Return x values not equal to 5.
x <- 1:12
x[x != 5]
# Sort x by increasing values of y.
x <- 101:112
y <- 12:1
x[order(y)]
# Return all but the first and third elements in x.
x <- 1:12
x[-c(1, 3)]
# Return list(2:3).
list(1:10, 2:3)[2]
# Return the vector 2:3.
list(1:10, 2:3)[[2]]
# Change the value of a matrix element.
x <- matrix(1:12, ncol=4)
x[1,1] <- NA
x[2,3] <- 8.4
# Change missing values to 0.
x[is.na(x)] <- 0
# Create an array with dimension 5x3x2.
A <- array(1:30, c(5, 3, 2))
# Return A.
A[]
# Return a numeric value, the first data value of A.
A[1, 1, 1]
A[1]
# Return a (5,2,2) array.
A[, 1:2, ]
# Return the vector 4:30.
A[A>3]
# Create a list of the numbers 1:4.
al <- as.list(1:4)
# Remove third component.
# The length of al is reduced by 1.
al[[3]] <- NULL
# Replace 2nd and 3rd component by NULL.
al[2:3] <- list(NULL)
# Use a matrix as a subscript.
x <- matrix(1:12, 4)
# Extract x[4,1], x[3,2], x[2,3].
x[cbind(4:2, 1:3)]