chisq.gof
Chi square Goodness-of-Fit Test
Description
Performs a chi square goodness-of-fit test.
Usage
chisq.gof(x, n.classes = ceiling(2 * (length(x)^(2 / 5))),
cut.points = NULL, distribution = "normal", n.param.est = 0, ...)
Arguments
x |
numeric vector. NAs and Infs are allowed but will be removed.
|
n.classes |
the number of cells into which the observations are to be allocated.
If the vector cut.points is supplied, then n.classes is set to
length(cut.points) - 1. The default is recommended by Moore (1986).
|
cut.points |
vector of cutpoints that define the cells.
x[i] is allocated to cell j if
cut.points[j] < x[i] <= cut.points[j+1].
If x[i] is less than or equal to the first cutpoint or greater than the last
cutpoint, then x[i] is treated as missing.
If the hypothesized distribution is discrete,
cut.points must be supplied.
|
distribution |
character string that specifies the hypothesized distribution.
distribution can be one of:
"normal", "beta", "cauchy", "chisquare", "exponential", "f", "gamma",
"lognormal", "logistic", "t", "uniform", "weibull", "binomial",
"geometric", "hypergeometric", "negbinomial", "poisson", or "wilcoxon".
You need only supply the first characters
that uniquely specify the distribution name.
For example, "logn" and "logi" uniquely specify the
lognormal and logistic distributions.
|
n.param.est |
number of parameters estimated from the data.
|
... |
parameters for the function that generates p-values for the hypothesized distribution.
|
Details
The chi-square test, introduced by Pearson in 1900, is the oldest and
best known goodness-of-fit test. The idea is to reduce
the goodness-of-fit problem to a multinomial setting by
comparing the observed cell counts with their expected
values under the null hypothesis. Grouping
the data sacrifices information,
especially if the underlying variable is continuous. On the other hand,
chi-squared tests can be applied to
any type of variable: continuous, discrete, or a combination of these.
Value
list of class "htest", containing the following components:
statistic: |
chi square statistic, with names attribute "chisq".
|
parameters: |
degrees of freedom of the chi square distribution associated with the statistic.
Component parameters has names attribute "df".
|
p.value: |
p-value for the test.
|
data.name: |
character string (vector of length 1) containing
the actual name of the input vector x.
|
counts: |
vector of the number of data points that fall into each cell.
|
expected: |
vector of counts expected under the null hypothesis.
|
Null hypothesis:
Let G(x) denote a distribution function.
The null hypothesis is that G(x) is the true
distribution function of x. The alternative hypothesis
is that the true distribution function of x is not G(x).
Test statistic:
Pearson's chi-square statistic, the same used in the function
chisq.test.
Asymptotically, the distribution of this statistic is
the chi-square distribution.
If the hypothesized distribution function is completely specified,
the degrees of freedom are m - 1, where m is the number of cells.
If any parameters are estimated, the degrees of freedom depend on
the method of estimation. The usual procedure is to estimate the
parameters from the original (i.e., not grouped) data, and then to
subtract one degree of freedom for each parameter estimated.
In truth, if the parameters are estimated by maximum likelihood, the
degrees of freedom are bounded between (m-1) and (m-1-k), where k is the
number of parameters estimated. Therefore,
especially when the sample size is small,
it is important to compare the test statistic to the
chi-square distribution with both (m-1) and (m-1-k) degrees of freedom.
See Kendall and Stuart (1979) for a more complete discussion.
References
Kendall, M. G., and Stuart, A. (1979).
The Advanced Theory of Statistics, Volume 2: Inference and Relationship,
(4th edition).
New York: Oxford University Press. Chapter 30
Moore, D. S. (1986). Tests of chi-squared type. In
Goodness-of-Fit Techniques.
R. B. D'Agostino and M. A. Stevens, eds.
New York: Marcel Dekker.
Conover, W. J. (1980).
Practical Nonparametric Statistics.
New York: John Wiley and Sons. pp. 189-199.
Note
The distribution theory of chi-square statistics is a large
sample theory. The expected cell counts are assumed to be
at least moderately large. As a rule of thumb, the
each should be at least 5. Although authors have found this rule
to be conservative (especially when the class probabilities are
not too unequal), the user should regard p-values with
caution when expected cell counts are small.
See Also
Examples
# generate an exponential sample
x <- rexp(50, rate = 1.0)
chisq.gof(x) # hypothesize a normal distribution
chisq.gof(x, dist = "exponential", rate = 1.0) # hypothesize an exponential distn.
x <- rpois(50, lambda = 3)
breaks <- quantile(x)
breaks[1] <- breaks[1] - 1 # want to include the minimum value
z <- chisq.gof(x, cut.points = breaks, dist = "poisson", lambda = 3)
z$count
z$expected