chisq.gof
Chi square Goodness-of-Fit Test

Description

Performs a chi square goodness-of-fit test.

Usage

chisq.gof(x, n.classes = ceiling(2 * (length(x)^(2 / 5))), 
    cut.points = NULL, distribution = "normal", n.param.est = 0, ...)

Arguments

x numeric vector. NAs and Infs are allowed but will be removed.
n.classes the number of cells into which the observations are to be allocated. If the vector cut.points is supplied, then n.classes is set to length(cut.points) - 1. The default is recommended by Moore (1986).
cut.points vector of cutpoints that define the cells. x[i] is allocated to cell j if cut.points[j] < x[i] <= cut.points[j+1]. If x[i] is less than or equal to the first cutpoint or greater than the last cutpoint, then x[i] is treated as missing. If the hypothesized distribution is discrete, cut.points must be supplied.
distribution character string that specifies the hypothesized distribution. distribution can be one of: "normal", "beta", "cauchy", "chisquare", "exponential", "f", "gamma", "lognormal", "logistic", "t", "uniform", "weibull", "binomial", "geometric", "hypergeometric", "negbinomial", "poisson", or "wilcoxon". You need only supply the first characters that uniquely specify the distribution name. For example, "logn" and "logi" uniquely specify the lognormal and logistic distributions.
n.param.est number of parameters estimated from the data.
... parameters for the function that generates p-values for the hypothesized distribution.

Details

The chi-square test, introduced by Pearson in 1900, is the oldest and best known goodness-of-fit test. The idea is to reduce the goodness-of-fit problem to a multinomial setting by comparing the observed cell counts with their expected values under the null hypothesis. Grouping the data sacrifices information, especially if the underlying variable is continuous. On the other hand, chi-squared tests can be applied to any type of variable: continuous, discrete, or a combination of these.
Value
list of class "htest", containing the following components:
statistic: chi square statistic, with names attribute "chisq".
parameters: degrees of freedom of the chi square distribution associated with the statistic. Component parameters has names attribute "df".
p.value: p-value for the test.
data.name: character string (vector of length 1) containing the actual name of the input vector x.
counts: vector of the number of data points that fall into each cell.
expected: vector of counts expected under the null hypothesis.
Null hypothesis:
Let G(x) denote a distribution function. The null hypothesis is that G(x) is the true distribution function of x. The alternative hypothesis is that the true distribution function of x is not G(x).
Test statistic:
Pearson's chi-square statistic, the same used in the function chisq.test. Asymptotically, the distribution of this statistic is the chi-square distribution. If the hypothesized distribution function is completely specified, the degrees of freedom are m - 1, where m is the number of cells. If any parameters are estimated, the degrees of freedom depend on the method of estimation. The usual procedure is to estimate the parameters from the original (i.e., not grouped) data, and then to subtract one degree of freedom for each parameter estimated. In truth, if the parameters are estimated by maximum likelihood, the degrees of freedom are bounded between (m-1) and (m-1-k), where k is the number of parameters estimated. Therefore, especially when the sample size is small, it is important to compare the test statistic to the chi-square distribution with both (m-1) and (m-1-k) degrees of freedom. See Kendall and Stuart (1979) for a more complete discussion.
References
Kendall, M. G., and Stuart, A. (1979). The Advanced Theory of Statistics, Volume 2: Inference and Relationship, (4th edition). New York: Oxford University Press. Chapter 30
Moore, D. S. (1986). Tests of chi-squared type. In Goodness-of-Fit Techniques. R. B. D'Agostino and M. A. Stevens, eds. New York: Marcel Dekker.
Conover, W. J. (1980). Practical Nonparametric Statistics. New York: John Wiley and Sons. pp. 189-199.
Note
The distribution theory of chi-square statistics is a large sample theory. The expected cell counts are assumed to be at least moderately large. As a rule of thumb, the each should be at least 5. Although authors have found this rule to be conservative (especially when the class probabilities are not too unequal), the user should regard p-values with caution when expected cell counts are small.
See Also
ks.test.
Examples
# generate an exponential sample 
x <- rexp(50, rate = 1.0) 
chisq.gof(x)  # hypothesize a normal distribution   
chisq.gof(x, dist = "exponential", rate = 1.0)  # hypothesize an exponential distn.   
x <- rpois(50, lambda = 3) 
breaks <- quantile(x) 
breaks[1] <- breaks[1] - 1   # want to include the minimum value 
z <- chisq.gof(x, cut.points = breaks, dist = "poisson", lambda = 3) 
z$count 
z$expected 
Package terrUtils version 6.0.0-69
Package Index