Fit Linear Regression Model


Creates an object of class "lm" or "mlm" that represents a linear model fit.


lm(formula, data = environment(formula), subset, weights, 
   na.action = getOption("na.action"), method = "qr", 
   model = TRUE, x = FALSE, y = FALSE, qr = TRUE, 
   singular.ok = TRUE, contrasts = NULL, offset, ...)


formula a formula object. The response variable, specified as either a single numeric variable or a matrix, must be on the left of a tilde (~) operator and the explanatory variables must be on the right. Additive explanatory variables may be combined with the plus (+) and interactions are encoded by putting an asterisk, (*), between interacting terms. See the help file for formula.object for details on formula syntax.
data If supplied, this is usually a data frame or named list containing the variables named in the formula, subset, weights, and offset arguments. Variables or functions not found in data are then expected to be found in environment(formula), which is usually the environment in which the formula was created.

data may also be an environment from which the variables and functions in the formula may be extracted, or a positive integer which is passed to parent.frame() to reference an environment in the call stack. If data is an environment (or positive integer signifying an environment), then environment(formula) is not used as a backup source of variables or functions.

If you do not supply a value for data or supply data=NULL, then all variables and functions in the formula must be accessible from environent(formula).

subset a vector that specifies which subset of observations to use in the fit. This can be a logical vector that is replicated so that its length is equal to the number of observations, a numeric vector indicating the observation numbers to include, or a character vector of the observation names that should be included. By default, all observations are included.

The variables and functions used in the expression given to subset will be searched for in same manner as those in formula.

weights a numeric vector that contains the observation weights. If supplied, the fitting algorithm minimizes the sum of the weights multiplied by the squared residuals. For additional technical details, see the details section. The number of observations must match length(weights). The value for each weight must not be negative; however, because zero weights are ambiguous, we recommend that the value for each weight be strictly positive. To exclude particular observations from the model, use the subset argument instead of assigning zero weights.

The variables and functions used in the expression given to weights will be searched for in same manner as those in formula.

na.action a function to filter missing data that is applied to model.frame after the application of any subset argument. The default na.fail returns an error if any missing values are found. An alternative is na.exclude, which excludes observations that contain one or more missing values.
method a character string that specifies the least squares fitting method to use in the function. The only available method is "qr", others are allowed for historical reasons, but the qr method is always used. The pseudo-method "model.frame" causes lm to return only the model.frame containing the variables that would be used in the model fitting process.
model a logical value. If TRUE, then the model.frame is returned as the model component of the fitted object.
x a logical value. If TRUE, then the model.matrix is returned as the x component of the fitted object. The default is FALSE.
y a logical value. If TRUE, then the response is returned as the y component of the fitted object. The default is FALSE.
qr a logical value. If TRUE (the default), then the QR decomposition of the model matrix is returned as the qr component of the fitted object.
singular.ok a logical value telling what to do if the explanatory variables are not all linearly independent (to a small numerical tolerance). If FALSE, then give an error. If TRUE (the default), then set the coefficients of the redundant variables to NA.
contrasts a list that gives contrasts for some or all of the factors that appear in the model formula. An element in the list should have the same name as the factor variable it encodes, and it should be either a contrast matrix (any full-rank matrix with as many rows as there are levels in the factor) or a function that computes such a matrix given the number of levels. If omitted or NULL use the contrasts listed in getOption("contrasts").
offset A numeric vector that will be subtracted from the response before fitting the model. The expression given as the offset argument can instead be included on the right side of the formula as the term +offset(offsetExpression).

The variables and functions used in the expression given to offset will be searched for in same manner as those in formula.

... additional arguments are passed to the fitting routine, lm.fit.
returns an object of class "lm" or "mlm" that represents the linear model fit. For detailed information, see lm.object.
If the response is a matrix, the returned object has class "mlm" and the coefficients, residuals, and effects are also matrices with columns corresponding to the individual response variables. Otherwise the returned object has class "lm" and those components are vectors.
Generic functions such as print and summary have methods for showing the results of a fit. For a description of the fit components, see lm.object. You should use the functions residuals, coefficients, and effects to extract components, instead of subscripting them with the $ operator from the lm.object. The extractor functions take account of special circumstances, such as models involving missing data or overdetermined models.
observation weights are implemented through the weights argument to most regression functions. Observation weights are appropriate when the variances of individual observations are inversely proportional to the weights. For a set of weights wi, one interpretation is that the ith observation is the average of wi other observations, each having the same predictors and (unknown) variance. This is the interpretation of the weights included in the claims example below. Another situation in which these types of weights arise is when you already know and can supply the relative precision of the observations.
Important: An observation weight is not the same as a frequency, or case weight, which represents the number of times a particular observation is repeated. It is possible to include frequencies as a weights argument to a regression function; although this produces the correct coefficients for the model, inference tools such as standard errors, p-values, and confidence intervals are incorrect. In addition, weighted regression when the absolute precision of the observations is known is not currently supported. This situation arises often in physics and engineering, when the uncertainty associated with a particular measurement is known in advance due to properties of the measuring procedure or device.
If you know the absolute precision of your observations, it is possible to supply them to the weights argument. This computes the correct coefficients for your model, but the standard errors and other inference tools will be incorrect.
Belsley, D. A., Kuh, E. and Welsch, R. E. (1980). Regression Diagnostics. New York: Wiley.
Draper, N. R. and Smith, H. (1981). Applied Regression Analysis (second edition). New York: Wiley.
Myers, R. H. (1986). Classical and Modern Regression with Applications. Boston: Duxbury.
Rousseeuw, P. J. and Leroy, A. (1987). Robust Regression and Outlier Detection. New York: Wiley.
Seber, G. A. F. (1977). Linear Regression Analysis. New York: Wiley.
Weisberg, S. (1985). Applied Linear Regression (second edition). New York: Wiley.
There is a vast literature available on regression; the references above are just a small sample. The book by Myers is an introductory text that includes a discussion of many of the recent advances in regression technology. The Seber book is at a higher mathematical level and covers much of the classical theory of least squares.
See Also
lm.object, model.matrix, model.frame, glm, gam, loess, tree, lm.fit.
For a description of the syntax of formulas, see formula.object.
lm(y ~ ., data=Sdatasets::freeny)
summary(lm(Fuel ~ Weight + Disp., data=Sdatasets::fuel.frame))

# Formulas have intercepts by default, so include # a -1 for regression without an intercept. lm(Fuel ~ Weight -1, data=Sdatasets::fuel.frame)

# Example of weighted regression lm(cost ~ age + type + car.age, data = Sdatasets::claims, weights = number, na.action = na.exclude)

# Are there significant interactions between driver age # and car type when modelling number of claims? anova(lm(number ~ age + type, data = Sdatasets::claims), lm(number ~ age * type, data = Sdatasets::claims))

Package stats version 6.0.0-68
Package Index