Stepwise Model Builder - Linear Regression Introductory Overview

The purpose of the Statistica Stepwise Model Builder - Linear Regression module is to facilitate the identification of linear regression models based on predictors chosen by the user at each step. The final linear regression model can then be saved in XML/PMML form or directly deployed to Statistica Enterprise.

The module will compute continuous and categorical predictors with multiple degrees of freedom, and automatically move the latter into/out of the regression equation in single steps.

At each step, Statistica will compute various statistics for predictors in the current model, and predictors (predictor candidates) not in the current equation. Statistics reflecting on the overall model quality are also computed.

Thus, you can build models by manually selecting the most important predictors into the regression equation one step at a time, using criteria of statistical significance for the prediction as well as policy and other criteria. By moving selected variables or groups of variables into the prediction and equation, and removing others from that equation, what-if (scenario) analyses are possible to assess the impact of certain model assumptions, policy, or regulatory constraints (for example, on predictors that are not permitted). Thus, analysts can build models that are parsimonious, consistent with policies, guidelines, and regulatory constraints, but are also as accurate as possible.

Linear Regression

The general purpose of multiple regression is to learn more about the relationship between several independent or predictor variables and a dependent or criterion variable

Personnel professionals customarily use multiple regression procedures to determine equitable compensation. You can determine a number of factors or dimensions such as "amount of responsibility" (Resp) or "number of people to supervise" (No_Super) that you believe to contribute to the value of a job. The personnel analyst then usually conducts a salary survey among comparable companies in the market, recording the salaries and respective characteristics (that is, values on dimensions) for different positions. This information can be used in a multiple regression analysis to build a regression equation of the form:

Salary = .5 * Resp + .8 * No_Super

Once this so-called regression line has been determined, the analyst can now easily construct a graph of the expected (predicted) salaries and the actual salaries of job incumbents in his or her company. Thus, the analyst is able to determine which position is underpaid (below the regression line) or overpaid (above the regression line), or paid equitably.