Contingency Table Analysis

table	a contingency table (array) to be fit by log-linear model. Typically, table is output from the table function. Neither negative nor missing values (NAs) are allowed.
margin	a list of vectors describing the marginal totals to fit. A margin is described by the factors not summed over. Thus list(1:2, 3:4) would indicate fitting the 1,2 margin (summing over variables 3 and 4) and the 3,4 margin in a four-way table. The names of factors (that is, names(dimnames(table))) can be used instead of indices.
start	the starting estimate for a fitted table. If start is omitted, a start is used that assures convergence. If structural zeros appear in table, start should contain zeros in corresponding entries and ones in other places. This assures that the fit contains those zeros.
fit	a logical value. If TRUE, estimated fit is returned. The default is FALSE.
eps	the maximum permissible deviation between an observed margin and a fitted margin.
iter	the maximum number of iterations.
param	a logical value. If TRUE, the parameter values are returned. Setting this to FALSE (the default) saves computation as well as space.
print	a logical value. If TRUE (the default), the final deviation and number of iterations is printed.

Details

Convergence is considered to be achieved if the maximum deviation between an observed and a fitted margin is less than eps. At most, iter iterations are performed. The fitting is currently done in single precision, other computations are in double precision.

The margins that are fit describe the model, similar to describing an ANOVA model. A high-order term automatically includes all the lower-order terms within it: for example, the term c(1,3) includes the one-factor terms 1 and 3. A factor that had constraints in the sampling plan should always be included. For example, if the sampling plan was such that there would be (precisely) x females and y males sampled, then gender should be in all models.

Both the LRT and the Pearson test statistics are asymptotically distributed chisquare with df degrees of freedom (assuming there are no zeros). A general rule of thumb is that the asymptotic distribution is trustworthy when the number of observations is 10 times the number of cells. If the two test statistics differ considerably, not much faith can be put in the test.

Using the test statistics to select a model is a rather backward use of hypothesis testing - a model can be "proved" wrong, but passing the test does not mean that the model is right. Bayesian techniques have been developed to select a good model (or models).

lrt	the Likelihood Ratio Test statistic. This is often called either L squared or G squared in the literature, and it is 2 times the discrimination information. It is defined as 2 * sum(observed * log(observed/expected)).
pearson	the Pearson test statistic (chi squared). It is defined as sum((observed - expected)^2/expected).
df	the degrees of freedom for the model fit. There is no adjustment for zeros; the user must adjust for them.
margin	a list of the margins that were fit. This is the input margin, except that the names of the factors are used if they are present.
fit	an array like table, but containing fitted values. This is returned only when the argument fit is TRUE.
param	the estimated parameters of the model. They are parametrized so that the (Intercept) component describes the overall mean, each single factor sums to zero, each two factor parameter sums to zero both by rows and columns, and so on. This is returned only when the argument param is TRUE.

Log-linear analysis studies the relationship between a number of categorical variables, extending the idea of simply testing for independence of the factors. Typically, the number of observations falling into each combination of the levels of the variables (factors) is modeled. The model, as the name suggests, is that the logarithm of the counts follows a linear model depending on the levels of the factors.

Description

Usage

Arguments

Details