fisher.test
Fisher's Exact Test for Count Data

Description

Performs a Fisher's exact test on a two-dimensional contingency table.

Usage



fisher.test(x, y = NULL, workspace = 2e+05, hybrid = FALSE, control = list(), or = 1, alternative = "two.sided", conf.int = TRUE, conf.level = 0.95, simulate.p.value = FALSE, B = 2000)

Arguments

x either a factor object or a two-dimensional contingency table in matrix form. If x is a matrix, then each dimension must be no less than 2. All elements must be either non-negative or NAs. Infs are not allowed. The elements of matrix x should be whole numbers, because the test is based on counts; the storage mode of x is coerced to "integer". For restrictions on x when it is a factor object, see argument y.
y a factor object. If x is a matrix, then y is ignored. If x is a factor object, then y is required and must have the same length as x. Each object must have no less than 2 levels. NAs in the category index vectors are allowed, but pairs (x[i],y[i]) containing these are removed. Each element of the index vectors of x and y should give the membership of that observation in one of the groups present in the levels attributes; an NA in an index vector means that the observation is not in one of the groups listed for that factor object. Infs have no meaning as indices, and should not be present.

Conversely, if x and y are present, and either x or y is not a factor object (and x is not a matrix), then it is coerced to one implicitly. In this case, pairs (x[i],y[i]) containing NAs are removed, but not pairs with Infs. Coercion of x and y in this manner is intended for datasets of mode numeric, whose elements are typically small integers; data in the form of character vectors should first be made into factor objects.

workspace an integer specifying the size of the workspace used for algorithm 643: Computes Fisher's exact test probabilities and a hybrid approximation to Fisher exact test probabilities for a contingency table using the network algorithm. In a unit of half double size(sizeof(double)/2), and used only for non-simulated p-value of a greater than 2 by 2 contingency table.
hybrid a logical flag. If TRUE, then a hybrid algorithm is used. This case involves an approximation. The default is FALSE. See Mehta, C. R. and Patel, N. R. (1986) & Clarkson, D. B., Fan, Y. and Joe, H. (1993). Used only for a non-simulated p-value of a greater than 2 by 2 contingency table.
control a named list with some control parameters for algorithm 643. At present, the only one used is "mult", which specifies the number of times as much space should be allocated to index keys. It is a non less than 2 integer (by default 30) used only for greater than 2 by 2 tables.
or a number specifying the hypothesized odds ratio. Used only in the 2 by 2 contingency table.
alternative a character string that specifies the alternative hypothesis. Must be one of "two.sided", "less" or "greater". A partial character string is supported for a unique match. Used only in the 2 by 2 contingency table. At this time, only the "two.sided" value for alternative is supported. If "less" or "greater" is specified, then a warning is issued and alternative is set to "two.sided".
conf.int a logical value. If TRUE (the default), then it indicates if confidence intervals for the odds ratio should be computed. Used only in the 2 by 2 contingency table.
conf.level a single number between 0 and 1 that specifies the confidence level for the returned confidence interval. Used only in the 2 by 2 contingency table and if conf.int is TRUE.
simulate.p.value a logical value. If TRUE, then p-values are computed by Monte Carlo simulation. The default is FALSE. Used only in the greater than 2 by 2 contingency table.
B an integer specifying the number of replicates to use in the Monte Carlo test.

Details

For the greater than 2 by 2 data table case, the algorithm(Algorithm 643) used in fisher.test is based on theory from Mehta and Patel (1983, 1986) and Joe (1985, 1988, 1993). For simulation result, the algorithm( Algorithm 159) from Patefield, W. M. (1981) is used instead.
For the exact 2 by 2 data table case, the null of conditional independence is equivalent to the hypothesis that the odds ratio equals 1. In general, the conditional distribution given the marginals is a non-central hypergeometric distribution H with non-centrality parameter (ncp), the odds ratio. The conditional Maximum Likelihood Estimate(MLE) for ncp is calculated by solving E(X) = x, where the expectation is with respect to H, a non-central hypergeometric distribution. The confidence interval for the odds ratio is calculated as ncp boundaries.
Value
returns a list of class "htest" containing the following components:

p.value the p-value for the test.
conf.int a confidence interval for the odds ratio. Returned only when data has the dimensions of 2 by 2, and the argument conf.int is set to TRUE.
estimate an estimate of the odds ratio when the data has the dimensions of 2 by 2. The return value is the conditional Maximum Likelihood Estimate.
null.value the odds ratio under the null of independence. Present only when data has the dimensions of 2 by 2.
alternative a character string that returns the alternative hypothesis used in the test.
method a character string giving the name of the method used. Additional string content is appended if simulate.p.value is set to TRUE for a greater than 2 by 2 table.
data.name a character string (vector of length 1) containing the actual name of the input argument x, and of y if both are factor objects.
Null hypothesis
Fisher's exact test is typically used to test the null hypothesis of independence between the row and column variables of the table. Certain types of homogeneity (for example homogeneity of proportions in a k by 2 table) are equivalent to the independence hypothesis. See the literature references for examples.
Test assumptions
Unlike many tests for categorical data whose test statistics have an asymptotic known distribution, Fisher's exact test does not require the cell counts to be large. However, because the test proceeds by conditioning on the marginal totals, it is important that this have a meaningful interpretation relative to the sampling scheme governing data collection.
Differences between TIBCO Enterprise Runtime for R and Open-source R
R allows alternative to be "less" or "greater" as well as the default value of "two-sided". These alternatives are not yet implemented in TIBCO Enterprise Runtime for R. If one of the unimplemented alternatives is specified, a warning is issued and the default method ("two-sided") is used.
References
(a) Statistical Theory
Bishop, Y. M. M., Fienberg, S. J., and Holland, P. W. (1980). Discrete Multivariate Analysis: Theory and Practice, Cambridge, Mass.: The MIT Press.
Fleiss, J. L. (1981). Statistical Methods for Rates and Proportions, 2nd ed. New York: Wiley.
Zar, J. H. (1984). Biostatistical Analysis, 2nd ed. Englewood Cliffs: Prentice-Hall.
Agresti, A. (1990) Categorical data analysis. New York: Wiley. Pages 59--66.
Agresti, A. (2002) Categorical data analysis. Second edition. New York: Wiley. Pages 91--101.
Fisher, R. A. (1935) The logic of inductive inference. Journal of the Royal Statistical Society Series A 98, 39--54.
Fisher, R. A. (1962) Confidence limits for a cross-product ratio. Australian Journal of Statistics 4, 41.
Fisher, R. A. (1970) Statistical Methods for Research Workers. Oliver & Boyd.
(b) Computer Algorithm
Joe, H. (1985). An Ordering of Dependence for Contingency Tables. Linear Algebra and its Applications 70, 89-103.
Joe, H. (1988). Extreme probabilities for contingency tables under row and column independence with application to Fisher's exact test. Communications in Statistics A, Theory and Methods 17, 3677-3685.
Mehta, C. R. and Patel, N. R. (1983). A network algorithm for performing Fisher's exact test in r*c contingency tables. Journal of the American Statistical Association 78, 427-434.
Mehta, C. R. and Patel, N. R. (1986). Algorithm 643. FEXACT: A Fortran subroutine for Fisher's exact test on unordered r*c contingency tables. ACM Transactions on Mathematical Software 12, 154-161.
Mehta, C. R. and Patel, N. R. (1986). A hybrid algorithm for Fisher's exact test in unordered r*c contingency tables. Communications in Statistics A, Theory and Methods 15, 387-404.
Clarkson, D. B., Fan, Y. and Joe, H. (1993) A Remark on Algorithm 643: FEXACT: An Algorithm for Performing Fisher's Exact Test in r x c Contingency Tables. ACM Transactions on Mathematical Software, 19, 484--488.
Patefield, W. M. (1981) Algorithm AS159. An efficient method of generating r x c tables with given row and column totals. Applied Statistics 30, 91--97.
See Also
chisq.test, mantelhaen.test, mcnemar.test, cut, table, Hypergeometric.
Examples
x <- factor(c(1,1,2,1,2,1,1,2,2), labels=c("A", "Abar"))
y <- factor(c(1,1,1,2,1,2,2,1,1), labels=c("B", "Bbar"))

table(x, y) # table from Fleiss, p. 25 fisher.test(x, y) fisher.test(table(x, y)) # same thing

x <- matrix(c(4, 7, 20, 12), nc=2) fisher.test(x) fisher.test(x, conf.level=0.85) fisher.test(x, conf.int=FALSE) fisher.test(x, or=0.9)

Z <- matrix(c(3, 5, 8, 6, 2, 7), nc=2) # non 2 by 2 case. fisher.test(Z) fisher.test(Z, simulate.p.value=TRUE, B=2500)

Package stats version 6.0.0-69
Package Index