Generate Random Samples or Permutations of Data

sample

Description

Generate a random sample of size observations from the population, or a sample from the integers 1 to n.

Usage

sample(x, size, replace = FALSE, prob = NULL)
sample.int(n, size = n, replace = FALSE, prob = NULL)

Arguments

x	a vector giving a population from which to sample, or a positive integer giving the size of the population n (in which case the population is 1:n). Missing values (NAs) are allowed and are treated like any other value.
size	the sample size. The default is the same as the population size; therefore, (with replace=FALSE) it generates a random permutation.
replace	a logical value. If TRUE, sampling is done with replacement; otherwise sampling is without replacement. The default is FALSE.
prob	a vector of probabilities of length n, giving probabilities of selection for each of the elements of x. The elements of prob are normalized to sum to one. The default NULL gives equal probabilities for each element of the population. Negative or invalid values (NA, Inf, and so on) are not allowed.
n	a positive integer giving the size of the population. A sample is drawn from 1:n. n cannot be larger than the largest positive integer 2147483647.

Details

To generate a sample from 1:n, we recommend using sample.int, because calling sample is ambiguous. However, for backward compatibility, you can let the sample argument x be an integer giving the sample size.

If x represents a population, it can be any object with a length for which subscripting works. For example, it can be a vector of character strings.

If prob is supplied and replace=FALSE, then values are drawn sequentially with probabilities proportional to prob, excluding elements already drawn. If n>1, this does not give overall selection probabilities proportional to prob; the actual selection probabilities are between those implied by prob and equal probabilities. Different permutations of the same set of outcomes also have different probabilities of being chosen.

If size>n and replace=FALSE, an error occurs.

Value

sample.int	returns a sample from 1:n. This is a vector.
sample	returns the same as sample.int, if x is a positive integer.
If x is a vector	returns a sample of the elements.
If x is a matrix or an array	returns a sample of the elements (not rows!).
If x is a data frame	returns a sample of the columns (not rows!).

Side Effects

The function sample creates the object .Random.seed if it does not already exist; if it exists, its value is updated.

See Also

runif to generate uniformly distributed real numbers.

Examples

sample(Sdatasets::state.name, 10)  # pick 10 unique states at random
sample(1e6, 75)  # pick 75 numbers between 1 and one million
sample.int(50)  # random permutation of numbers 1:50
# Bernoulli(.3) sample of size 100
sample(0:1, 100, TRUE, c(0.3, 0.7))
# 20 uniformly distributed numbers on the integers 1:10
# with replacement
sample.int(10, 20, replace=TRUE)
sample(5, 20, prob=c(0.3, 0.4, 0.1, 0.1, 0.1), replace=TRUE)
# Error: cannot take a sample larger than the population
# when 'replace = FALSE':
## Not run: 
sample.int(5, 20, prob = c(0.3, 0.4, 0.1, 0.1, 0.1))

## End(Not run)
df <- as.data.frame(matrix(1:12, nrow=3))
sample(df, 2) # pick two columns from df
sample(matrix(1:12, nrow=3), 8)
sample.int(c(3:8))

Package base version 6.0.0-69
Package Index