model.frame
Construct or Extract a Model Frame
Description
Given a formula and some data, model.frame returns a data frame
with a terms attribute that contains sufficient information
for many fitting function (e.g., lm or glm) to fit the
formula to the data. The model.frame will contain a column
for each elementary term in the formula, but no columns for interaction
terms. E.g., the formula ~sqrt(x)+log(x)*log(y) will cause it
to have 3 columns, sqrt(x), log(x), and log(y).
Given a fitted model, model.frame will return the
data.frame-with-terms object used to fit the model.
get_all_variables a data.frame containing a column
for each variable mentioned in the formula. E.g., the formula
~sqrt(x)+log(x)*log(y) will cause it to have 2 columns,
x and y.
Function model.frame is an S Version 3 generic (see Methods);
method functions can be written to handle specific S Version 3 classes
of data. Besides the default method, classes that already have methods
for model.frame include
aovlist, glm and lm.
Usage
model.frame(formula , ...)
## S3 method for class 'lm':
model.frame(formula, ...)
## S3 method for class 'aovlist':
model.frame(formula, data = NULL, ...)
## Default S3 method:
model.frame(formula, data = NULL, subset = NULL, na.action = na.fail,
drop.unused.levels = FALSE, xlev = NULL, ...)
get_all_vars(formula, data = NULL, ...)
Arguments
formula |
the formula or other object defining what terms should be included in the model frame.
Besides being a formula object, this can be a fitted model of various
kinds, in which case the formula used in fitting the model defines
the terms.
|
data |
data frame from which the model frame is to be
constructed. After looking for variables in the
data argument, model.frame will look
in the environment of the formula, which is usually
the environment in which the formula was constructed.
|
subset |
a vector that specifies a subset from the data frame (data)
to use in formula.
|
na.action |
a function to filter missing data. The default is the function na.fail.
|
drop.unused.levels |
If TRUE then unused levels in any factor will be
omitted from the "levels" attribute of the factor in the resulting model
frame. If FALSE (the default), unused factors levels will not be dropped
so the model frame will contain factors with all of the original levels, used or not.
|
xlev |
an optional named list with names specifying the factor columns contained in "data"
and values corresponding to levels contained in them in the resulting model frame.
Note that levels may be dropped if they occur in "data". However, if some levels are not
contained in "data" but it is desired to retain those levels in the resulting
model frame, "xlev" allows you to do so. If "xlev" is
provided "drop.unused.levels" will be ignored.
|
... |
other arguments pass to or from the methods.
It could be data, subset, na.action or weights etc.
|
Details
The response and any extra variables other than subset are
stored in the data frame.
They should be retrieved from the frame
by using model.extract(fr, response) for response,
model.extract(fr, weights) for weights,
and so on for whatever names were used in the arguments to model.frame.
Other than subset, the names of such extras
are arbitrary; they only need to evaluate to a legitimate variable
for the data frame (e.g., a numeric vector, a matrix, or a factor).
The names of such variables are specially coded in the model frame
so as not to conflict with variable names occurring in the terms.
You should always use model.extract, which shares
the knowledge of the coded names with model.frame, rather than
assuming a specific coding.
The function get_all_vars get all variables from formula or
data(if formula is not given).
Value
The function model.frame and S3 methods return
a data frame
representing all the terms in the model (precisely, all those terms
of order 1; i.e., main effects), plus the response if any, and
any special extra variables (such as weight arguments to
fitting functions).
One such argument is handled specially---namely, subset=.
If this argument is present, it is used to compute a subset of the
rows of the data.
It is this subset that is returned.
The returned data frame has an attribute terms containing the terms
object defined by the formula, constructed by the terms function.
get_all_vars returns a data frame, containing variables in formula or data.
Note
Model frames are more typically produced as a side-effect of fitting a model
rather than directly by calling model.frame.
Functions like lm take an option model=TRUE/FALSE,
that controls whether the model.frame is stored as part of the
fitted model object.
References
Chambers, J. M. and Hastie, T .J. (Eds.) 1992. Statistical Models in S. Pacific Grove, CA.: Wadsworth & Brooks/Cole. Chapter 3.
See Also
Examples
model.frame(ozone ~ radiation + temperature, subset=(wind < 9.7),
data=Sdatasets::air)
fit <- aov(plants ~ variety * treatment + Error(flats), data=Sdatasets::guayule)
model.frame(fit)
get_all_vars(Mileage ~ Disp. + log(HP),
data=Sdatasets::car.all[Sdatasets::car.all$Country == "USA", ])
glm.fit <- glm(Kyphosis ~ ., family = binomial, data = Sdatasets::kyphosis)
model.frame(glm.fit)