lsfit
Linear Least-Squares Fit

Description

Fits a (weighted) least squares multivariate regression. A list of the estimated coefficients and residuals as well as the QR decomposition of the matrix of explanatory variables is returned.

Usage

lsfit(x, y, wt=NULL, intercept=T, tolerance=1.e-07,
      yname=NULL)

Arguments

x vector or matrix of explanatory variables. If a matrix, each column represents a variable and each row represents an observation (or case). This should not contain column of 1s unless the argument intercept is FALSE. The number of rows of x should equal the number of observations in y, and there should be fewer columns than rows. NAs and Infs are allowed but will be removed.
y response variable(s): a vector or a matrix with one column for each regression. NAs and Infs are allowed but will be removed.
wt vector of weights with length equal to the number of observations. If the different observations have non-equal variances, wt should be inversely proportional to the variance. By default, an unweighted regression is carried out. NAs and Infs are allowed but will be removed.
intercept if TRUE, a constant (intercept) term is included in each regression.
tolerance numerical value used to test for singularity in the regression.
yname vector of names to be used for the y variates in the regression output. However, if y is a matrix with dimnames attribute containing column names, then these will be used.

Details

An observation is considered unusable if there is an NA or Inf in any response variable, any explanatory variable or in the weight (if present) for the observation. If your data have several missing values, there may be much better ways of analyzing your data than throwing out the observations like this; see, for instance, chapter 10 of Weisberg (1985).
The lsfit function does least squares regression, that is, it finds a set of parameters such that the (weighted) sum of squared residuals is minimized. The (implicit) assumption of least squares is that the errors have a Gaussian distribution - if there are outliers, the results of the regression may be misleading.
The assumptions of regression are that the observations are statistically independent, the response y is linear in the covariates represented by x, and that there is no error in x.
A time series model is one alternative if the observations are not independent. A robust regression can help if there are gross errors in x (e.g., typographical errors) since this will likely make the corresponding responses appear to be gross outliers; these points are likely to have high leverage (see hat). If the x matrix is not known with certainty (an "errors-in-variables" model), the regression coefficients will typically be biased downward.
The classical use of a weighted regression is to handle the case when the variability of the response is not the same for all observations. Another approach to this same problem is to transform y and/or the variables in x so that there is constant variance and linearity holds. In practice it is often the case that a transformation which helps linearity also improves problems with the variance. If a choice is to be made, the linearity is more important since a weighted regression can be used.
It is good data analysis practice to view plots to check the suitability of a solution. Appropriate plots include the residuals versus the fit, the residuals versus the x variables, and a qqplot of the residuals.
Polynomial regression can be performed with lsfit by using a command similar to cbind(x, x^2). It is better numerical practice to create orthogonal polynomials, especially as the order of the polynomial increases. When orthogonal polynomials are not used, the columns of the x matrix can be quite collinear (one column is close to being a linear combination of other columns). Collinearity outside of the polynomial regression case can cloud interpretation of the results as well as being a numerical concern.
Value
a list representing the result of the regression, with the following components:
coef vector or matrix of coefficients. This is a matrix only if y has more than one column, in which case coef contains one column for each regression with optional constant terms in the first row. Its dimnames are taken from x, y and yname if applicable.
residuals object like y containing residuals. This component is not present when x is a big data object.
wt if wt was given as an argument, it is also returned as part of the result.
intercept logical value: records whether an intercept was used in this regression.
qr object of class "qr" representing the numerical decomposition of the x matrix (plus a column of 1s, if an intercept was included). If wt was specified, the qr object will represent the decomposition of the weighted x matrix. See function qr for the details of this object. It is used primarily with functions like qr.qty, that compute auxiliary results for the regression from the decomposition.
details If x is a big data object, the results include a details component. This is a list of summary information available from the big data linear regression routine that is not part of the standard lsfit results.
References
Becker, R. A., Chambers, J. M., and Wilks, A. R. 1988. The New S Language: A Programming Environment for Data Analysis and Graphics. Pacific Grove, CA: Wadsworth & Brooks/Cole Advanced Books and Software.
See Also
lm, ls.print, ls.diag, hat for leverage, qr, qr.coef, abline, ppr, cancor.
Examples
regfreeny <- lsfit(Sdatasets::freeny.x, Sdatasets::freeny.y)
ls.print(regfreeny)
Package stats version 6.1.1-7
Package Index