colVars
Row and Column Variances and Standard Deviations

Description

Return the variance, or standard deviation for the specified columns, rows, or dimensions of arrays.

Usage

colVars(x, na.rm = FALSE, dims = 1, unbiased = TRUE, SumSquares = FALSE,
    weights = NULL, freq = NULL, n = NULL)
colStdevs(x, na.rm = FALSE, dims = 1, unbiased = TRUE, SumSquares = FALSE,
    weights = NULL, freq = NULL, n = NULL)
rowVars(x, na.rm = FALSE, dims = 1, unbiased = TRUE, SumSquares = FALSE,
    weights = NULL, freq = NULL, n = NULL)
rowStdevs(x, na.rm = FALSE, dims = 1, unbiased = TRUE, SumSquares = FALSE,
    weights = NULL, freq = NULL, n = NULL)

Arguments

x a matrix, data frame, array, or numeric vector.
na.rm a logical value that specifies how to handle missing values (NAs) in x. If FALSE (the default), missing values in the input will result in missing values in corresponding elements of the output. If TRUE, missing values are omitted from calculations.
dims an integer value that specifies the number of dimensions to treat as rows. For example, if x is an array with more than two dimensions (say five), dims determines what dimensions are summarized; if dims = 3, then rowMeans is a three-dimensional array consisting of the means across the remaining two dimensions, and colMeans is a two-dimensional array consisting of the means across the last three dimensions.
unbiased a logical value that specifies if the values in x are an unbiased estimate of the true variance.
  • If TRUE (the default) variances are sample variances. For example, for a vector:
    sum((x-mean(x))^2)/(n-1)
    where n is the length of the vector.
    This is unbiased if the values in x are obtained by simple random sampling.
  • If FALSE it uses the definition
    sum((x-mean(x))^2)/n
This argument is ignored if SumSquares = TRUE.
SumSquares a logical value, if TRUE, then unnormalized sums of squares are returned, with no division by either n or (n-1). If SumSquares = TRUE, the unbiased argument is ignored.
weights a numeric vector that has the same number of observations as x. If x is a matrix, the number of rows for rowMeans or columns for colmeans. If you specify a value for weights, then the unbiased argument is ignored. The definitions used when weights are specified are:
sum(weights * (x - weighted.mean(x, weights=weights))^2)/sum(weights)
if SumSquares = FALSE and
sum(weights * (x - weighted.mean(x, weights=weights))^2)
if SumSquares = TRUE
freq a numeric vector that consists of positive integers with the same number of observations as x. If present, the kth row of x is repeated k times. The effect is similar to the weights argument, except this argument does not ignore the unbiased argument and division is by (sum(freq)-1) rather than (n-1) if unbiased is TRUE.
n an integer that specifies the number of rows; if supplied this overrides the actual number of rows in x. This is useful for obtaining summaries on regular subsets of the data.

Details

colVars(x) is equivalent to diag(var(x)) if x is a matrix, but is faster (and uses column names).
The primary use of n is to compute summaries for a vector without first turning it into an array.
Variances are computed by the numerically accurate corrected two-pass method described in Chan, Golub, and LeVeque (1983).
Value
returns the means, sums, variances, or sums of squares by row or column. Generally, this return values are contained in a vector, but if x is an array and the value of dims implies that the result has at least two dimensions the container is a matrix or an array.
If you specify n, then a vector without names is returned (dims is ignored). Otherwise, if x contains names or dimnames the result also contains names or dimnames.
References
Chan, T., Golub, G., and LeVeque, R. (1983). Algorithms for computing the sample variance: analysis and recommendations. The American Statistician, 37, 242-247.
Differences between TIBCO Enterprise Runtime for R and Open-source R
See Also
apply, is.na, is.nan, mean, stdev, sum, var, colMeans, colMedians, colMins, colProds, colQuantiles.
Examples
x <- matrix(1:12, 4)
rowVars(x)
colStdevs(x)

# Summaries for regular subsets of a vector x <- 1:50 colVars(x, n=10) # groups of 5 consecutive observations

# Higher-dimensional array x <- array(runif(24), dim=c(2,3,4)) rowVars(x) # vector of length 2. rowVars(x, dims=2) # 2x3 matrix. apply(x, 1:2, var) # same as previous colVars(x) # 3x4 matrix. colVars(x, dims=2) # vector of length 4. colVars(aperm(x, c(2,1,3))) # 2x4 matrix colVars(x[1,,]) # vector of length 4 diag(var(x[1,,])) # same as previous

# Investigate the distribution of the sample mean and t-statistic # when the underlying population is not normal x <- rexp(1000 * 20) # 1000 samples of size 20 means <- colMeans(x, n=20) stdevs <- colStdevs(x, n=20) qqnorm(means) plot(means, stdevs) # These would be independent for a normal population qqnorm( (means - 1) / stdevs )

# The first three lines in that study could be replaced with x <- matrix(rexp(1000 * 20), 20) # 1000 samples of size 20 means <- colMeans(x) stdevs <- colStdevs(x)

Package terrUtils version 6.0.0-69
Package Index