prop.test
Proportions Tests

Description

Compares proportions against hypothesized values. Alternatively, tests whether underlying proportions are equal.

Usage

prop.test(x, n, p = NULL, alternative = c("two.sided", "less", "greater"), 
    conf.level = 0.95, correct = TRUE)

Arguments

x, n numeric vectors, a table, or a matrix that specifies the counts of successes and trials respectively.
  • vectors: length(x) must equal length(n), the elements in n must be positive, elements in the x must be non-negative, and elements in x must be less than the corresponding values in n. Because the proportions tests are based on counts, the elements in x and n should be whole numbers; however, the storage mode of x and n will be coerced to double.
  • table: If x is a table, n (if specified) is ignored.
  • matrix: If x is a matrix, n (if specified) is ignored.
Missing values (NAs) and infinite values (Infs) are allowed, but in the case where an observation pair (x[i], n[i]) has at least one NA or Inf value, that observation pair is removed.
p vector of probabilities of success specified by the null hypothesis. length(p) must equal length(x) and length(n), and all elements must be greater than zero and less than one.
  • If p=NULL (the default) and there is only one group (length(x)==1), the null hypothesis tested is that the true probability of success is 0.5; however, if there is more than one group, the null hypothesis tested is that the true probability of success is the same in all groups.
  • If p is not NULL, the null hypothesis tested is that the vector of true probabilities of success is equal to p, regardless of the number of groups.
Missing values (NAs) and infinite values (Infs) are not allowed.
alternative a character string that specifies the alternative hypothesis. Possible values are: "two.sided", "greater", "less". Note: You need to enter only enough of the character string to create a unique match for the value.

alternative is usually automatically set to two.sided in most cases. The values greater and less are meaningful in two special cases.

  • If there is one group, alternative pertains to the true probability of success in relation to its value specified under the null hypothesis (see argument p).
  • If there are two groups and p=NULL, so that the null hypothesis tested is that the true probability of success is the same in both groups, then alternative pertains to the true probability of success in the first group in relation to that in the second.
conf.level a numeric vector in the range [0, 1] that specifies the confidence level for the returned confidence interval.

conf.level is meaningful only when there is one group, or when p=NULL and there are two groups. (See the description of the alternative argument for more information.) In all other cases, conf.level is ignored.

correct a logical value. If TRUE (the default), Yates' continuity correction is applied, but only under certain conditions:
  • When there is only one group, the continuity correction may not exceed in magnitude the difference between the sample proportion x/n and the hypothesized true probability of success.
  • When there are two groups, and p=NULL, then the continuity correction may not exceed in magnitude the difference between the sample proportions.
The continuity correction is never applied when there are more than two groups. See the Details section for an algebraic definition of the continuity correction.

Details

Suppose that all elements of x and n are valid numbers-- that is they are not NA, Inf, or other special values-- so that the number of groups used in the test is given by length(x). Conceptually, the data can be arranged in a length(x) by 2 table, where rows correspond to groups (samples), and columns to "success" or "failure" respectively. Thus the entry in the i-th row and j-th column is x[i] if j==1 or n[i] - x[i] if j==2.
  1. Testing if Probabilities of Success Equal Those Specified in p

    To test the null hypothesis that the true probabilities of success equal those specified in input argument p (or 0.5 if p=NULL in the case of only one group), Pearson's X-squared statistic is computed for the above table, with expected counts of successes given by n*p and expected counts of failures by n*(1-p). Under the null hypothesis, the X-squared statistic has an asymptotic chi-square distribution with length(x) degrees of freedom.

    When there is only one group, X-squared coincides with the square of the Z statistic used to compare a proportion with a specified value.

  2. Testing if All Probabilities of Success Are the Same

    To test the hypothesis that the true probability of success is the same in each of the length(x) > 1 groups (the default when p=NULL), Pearson's X-squared statistic is again used with the above table, this time with expected counts of successes estimated by n*(sum(x)/sum(n)) and expected counts of failures by n*(1-sum(x)/sum(n)). This estimates the (common) probability of success as the total number of observed successes divided by the total number of trials. Under the null hypothesis, X- squared has an asymptotic chi-square distribution with length(x)-1 degrees of freedom. It can be shown that X- squared computed this way is algebraically equivalent to X-squared for the hypothesis of independence between the row and column attributes of the table. Furthermore, when there are just two groups, the statistic coincides with the square of the Z statistic used to compare two proportions.

Value
returns a list of class htest containing the following components:
statistic the X-squared statistic.
parameters the degrees of freedom of the asymptotic chi-square distribution associated with the X-squared statistic.
p.value the asymptotic p-value for the test.
conf.int In the following two cases, the confidence level is recorded in the attribute conf.level.
  • If there is one group, a confidence interval for the true probability of success.
  • If there are two groups and input argument p=NULL, conf.int contains a confidence interval for the difference in probabilities of success between the first and second groups.
In all other cases, conf.int is not returned.
estimate a numeric vector that returns the sample proportions as calculated by x / n, which estimate the true probabilities of success in the corresponding groups. When there is only one group the names attribute is p and when there are two or more groups the names attribute is prop 1, prop 2, ....
null.value when the null hypothesis is that the true probabilities of success equal specified values (usually input argument p), the component null.value records these specified values, and returns them along with a names attribute as described under component estimate. In all other cases, null.value is not returned.
alternative a character string that returns the alternative hypothesis (two.sided, greater, or less) as specified in the alternative argument.

If there is only one group, or when there are two groups and the argument p=NULL, alternative returns the actual value specified for the alternative argument. In all other cases, alternative returns two.sided.

method a character string that returns the name of the method used, including whether Yates' continuity correction was applied.
data.name a character string that contains the actual names of the input vectors x, n, and of p, if given.
Null hypothesis
Two types of null hypothesis can be tested:
Test assumptions
The function operates on the assumption that each of the length(x) samples is independent of the others, and that each sample consists of a predetermined number n[i] of independent trials, for which the true probability of success is constant. Furthermore, the p-value is based on an approximation which works best when none of the probabilities of success is close to zero or one, and when the numbers of trials n[i] are not too small. At the very minimum, all (estimated) expected counts of successes or failures should be at least five.
For details on the approximation and the definition of expected counts see the Details section.
References
Fienberg, S. E. 1983. The Analysis of Cross-Classified Categorical Data.
Fleiss, J. L. 1981. Statistical Methods for Rates and Proportions. Second Edition. New York, NY: Wiley.
Newcombe, R. G. 1998. Interval estimation for the difference between independent proportions: comparison of eleven methods. Statistics in Medicine. Volume 17. 873-890.
Newcombe, R. G. 1998. Two-sided confidence intervals for the single proportion: comparison of seven methods. Statistics in Medicine. Volume 17. 857-872.
Snedecor, G. W. and Cochran, W. G. 1980. Statistical Methods. Seventh Edition. Ames, IA: Iowa State University Press.
Wilson, E. B. 1927. Probable inference, the law of succession, and statistical inference. J. Am. Stat. Assoc. Volume 22. 209-212.
See Also
binom.test, chisq.test, fisher.test, Binomial.
Examples
# (a) Testing Whether Probabilities of Success Equal Those Specified in 'p'.

# The null hypothesis is that the probability of heads for # this coin is 0.6. The alternative is two-sided. A # confidence interval for the true probability of heads # will be computed. heads <- 5; tosses <- 10 prop.test(heads, tosses, 0.6)

# Same as above, but now the null probability is 0.5, the # default for 'p' when there is only one group. This is # a test that the coin is unbiased. prop.test(heads, tosses)

# The null hypothesis is that all probabilities of success # are equal to 0.9. The alternative is that at least one of # them isn't. successes <- c(19, 20, 18); trial.counts <- c(21, 25, 23) prop.test(successes, trial.counts, rep(0.9, times=length(successes)))

# (b) Testing Whether All Probabilities of Success are the Same.

# The null hypothesis is that the incidence probabilities # in the two groups are the same. The alternative is that # the probability in Group 1 exceeds that in Group 2. # A confidence interval for the difference in the true # probabilities (Group 1 minus Group 2) will be computed. incidence.counts <- c(12, 15); group.sizes <- c(20, 20) prop.test(incidence.counts, group.sizes, alternative="greater" )

# Data from Fleiss (1981), p. 139. The null hypothesis is # that the four populations from which the patients were # drawn have the same true proportion of smokers. The # alternative is that this proportion is different in at # least one of the populations. smokers <- c(83, 90, 129, 70) patients <- c(86, 93, 136, 82) prop.test(smokers, patients)

# Matrix format, the result is the same as above prop.test(cbind(smokers, patients-smokers))

Package stats version 6.1.1-7
Package Index