Linear regression models the numeric response column as a weighted sum of the predictor columns. The weights, also known as the regression coefficients, are selected by the method of least squares, which minimizes the sum of the squared differences between the observed response and the predictions based on the weighted sum.

Any predictor column with character data is expanded into a set of indicator columns, one column for each unique value in the character column. The indicator column for a character value is 1 if the corresponding entry in the original column contains that character value, otherwise it is zero. Character data columns used as predictors should each have small numbers of unique values relative to the total number of rows in the data set.

See also:

Details on Regression Modeling – General