chisq.test
Pearson's Chi-square Test for Count Data

Description

Performs a Pearson's chi-square test on a two-dimensional contingency table.

Usage

chisq.test(x, y = NULL, correct = TRUE, p = rep(1/length(x), length(x)),
    rescale.p = FALSE, simulate.p.value = FALSE, B = 2000)

Arguments

x a factor or a two-dimension contingency table in either a matrix or a data frame form (a data frame is coerced to matrix with as.matrix).

If x is a contingency table, it must have at least two rows and two columns. All elements must be non-negative, and neither NAs nor Infs are allowed. The elements of the contingency table should be whole numbers, because the test is based on counts; however, because all computations are carried out to double precision accuracy, where possible, the storage mode of x is coerced to double.

If x is a factor, certain restrictions are imposed. See argument y for details.

y a factor object.
  • If x is a factor object, y is required and length(x) must equal length(y).
  • If x is a contingency table as either a matrix or a data frame, y is ignored.
The factors must have at least two levels. Missing values (NAs) are allowed, but in the case where an observation pair (x[i], y[i]) has at least one missing value NA, that observation pair is removed.

Conversely, if x or y is not a factor object (and x is not a contingency table), it is coerced to one implicitly. In this case, pairs (x[i],y[i]) containing NAs are removed, but pairs with Infs are not removed. Coercion of x and y in this manner is intended for datasets of mode numeric, whose elements are typically small integers.

correct a logical scalar. If TRUE (the default) and simulate.p.value = FALSE, Yates' continuity correction is applied, but only for dichotomous categories (2 by 2 tables).
p a numeric vector, with the same length as x, that contains the probabilities. Elements with a negative value are not allowed. p is used to calculate the return value for expected.
rescale.p a logical value. If TRUE and sum(p) > 1, then p is rescaled to sum of 1. Otherwise it returns the "probabilities must sum to 1" error. The default is FALSE.
simulate.p.value a logical value. If TRUE, p-values are computed by Monte Carlo simulation. The default is FALSE.
B an integer specifying the number of replicates to use in the Monte Carlo test.

Details

For a two-dimension contingency table x, Pearson's X-squared statistic is defined as
sum((abs(x - E) - YATES)^2 / E), where
E = outer(rowSums(x), colSums(x), "*") / sum(x) and
YATES = if(correct && nrow(x) == 2 && ncol(x) == 2) 0.5 else 0.
Value
The following components are returned:
statistic Pearson's X-squared statistic with the names attribute X-squared. See the details section for the definition.
parameter degrees of freedom of the asymptotic chi-square distribution that is associated with statistic with the names attribute "df". Given by the product (R-1)*(C-1), where R is the number of rows and C the number of columns of the contingency table.
p.value asymptotic p-value for the test.
method a character string listing the name of the method, along with whether Yates' continuity correction was applied.
data.name a character string (vector of length 1) containing the name of the input argument x, and of y if both x and y are factor objects.
observed the observed counts. The value of x.
expected the expected counts under the null hypothesis.
residuals the Pearson residuals, whose value is (x - E)/sqrt(E), where E is expected.
Null hypothesis
The expected cell counts are estimated as the products of the observed marginal totals divided by the table total. These expected counts are relevant to several types of null hypothesis: statistical independence of the rows and columns, homogeneity of groups, and so on. The appropriateness of the test to a particular null hypothesis and the interpretation of the results depend on the nature of the data at hand, in particular on the sampling scheme. For more information, see Fleiss (1981).
Test assumptions
Take care in the interpretation of the returned p.value because its validity depends heavily on the assumption that the expected cell counts are at least moderately large; a minimum size of five is often quoted as a rule of thumb. Even when cell counts are adequate, the chi-square is only a large-sample approximation to the true distribution of X-squared under the null hypothesis.
Indiscriminate use of chisq.test with arbitrary count data is discouraged. The null hypothesis (that is, probability model), sampling scheme and sizes of the counts all have bearing on the meaningfulness of the test, and some thought should be given to these.
References
Fienberg, S. E. (1983). The Analysis of Cross-Classified Categorical Data, 2nd ed. Cambridge, Mass.: The MIT Press.
Fleiss, J. L. (1981). Statistical Methods for Rates and Proportions, 2nd ed. New York: Wiley.
Snedecor, G. W. and Cochran, W. G. (1980). Statistical Methods, 7th ed. Ames, Iowa: Iowa State University Press.
See Also
fisher.test, mantelhaen.test, mcnemar.test, factor, cut, table.
Examples
x <- factor(c(
    "A","B","A","A","B","B","B","A","B","B","B","B","B","A","B",
    "B","A","B","A","A","A","A","B","A","A","B","A", "B","B","A","A"))
y <- factor(c(
    "Yes","No","No","No","No","No","Yes","Yes","Yes","No",
    "No","Yes","No","Yes","No","No","Yes","Yes","Yes","No","Yes",
    "Yes","No","No","No","Yes","No","No","No","Yes","Yes"))
table(x, y)
#   y
# x   No Yes
#   A  6   9
#   B 11   5

chisq.test(x, y) # Pearson's Chi-squared test # data: x and y # X-squared = 1.5534, df = 1, p-value = 0.2126

chisq.test(table(x, y)) # Pearson's Chi-squared test with Yates' continuity correction # data: table(x, y) # X-squared = 1.5534, df = 1, p-value = 0.2126

Package stats version 6.1.1-7
Package Index