Example 7: Simple Regression Analysis
Ribbon bar. Select the Home tab. In the File group, click the Open arrow and select Open Examples to display the Open a Statistica Data File dialog box. Open the data file, which is located in the Datasets folder.
Classic menus. From the File menu, select Open Examples to display the Open a Statistica Data File dialog box. Open the data file, which is located in the Datasets folder.
The following image shows the Variable Specifications Editor, which lists information for each variable. To display the Editor:
Ribbon bar. Select the Data tab. In the Variables group, click All Specs.
Classic menus. from the Data menu, select All Variable Specs.
Click the Cancel button to close the Editor.
One possible hypothesis is that population change and the percent of families below poverty level are related. It seems reasonable to expect that poverty will lead to outward migration; thus, there should be a negative correlation between the percent below poverty level and population change. Accordingly, you will treat variable 1 (POP_CHNG) as the predictor variable.
Ribbon bar. Select the Statistics tab. In the Advanced/Multivariate group, click Advanced Models and from the menu, select General Linear to display the General Linear Models (GLM) Startup Panel.
Classic menus. From the Statistics - Advanced Linear/Nonlinear Models submenu, select General Linear Models to display the General Linear Models (GLM) Startup Panel.
Select Simple regression as the Type of analysis, select Quick specs dialog as the Specification method, and then click the OK button to display the GLM Simple Regression Quick Specs dialog box.
Click the Variables button to display the standard variable selection dialog box. Select PT_POOR in the Dependent variable list, POP_CHNG as the Predictor variable, and then click the OK button to return to the GLM Simple Regression Quick Specs dialog box.
To view the syntax program automatically generated from the specifications, click the Syntax editor button in the GLM Simple Regression Quick Specs dialog box to display the GLM Analysis Syntax Editor.
The remainder of the specifications for this analysis can use the default specifications, so click the OK (Run) button in the GLM Analysis Syntax Editor or the OK button in the GLM Simple Regression Quick Specs dialog box to perform the analysis.
Reviewing Results
In the POP_CHNG row, Param. column, the unstandardized regression coefficient for the regression of PT_POOR on POP_CHNG is -0.40374. This means that for each unit decrease in population, there is a .40374 unit increase in poverty. The upper and lower (default) 95% confident limits for this unstandardized coefficient do not include zero, so the regression coefficient is significant at p<.05. Note that the standardized coefficient, which is also the Pearson correlation coefficient for simple regression designs, is -.65, which means that for each standard deviation decrease in population there is a .65 standard deviation increase in poverty.
Right-click on the PT_POOR Param. column heading in the spreadsheet that was just created, and select Graphs of Input Data - Histogram PT_POOR - Normal Fit from the resulting shortcut menu to display the following default histogram.
Via the Histogram command on the Graphs tab or menu, you can produce the histogram of variable PT_POOR with more intervals. (On the 2D Histograms dialog box - Quick tab, click the Variables button and select PT_POOR, and then click the OK button; then enter 16 in the Categories box in the Intervals group box, and click the OK button.) As you can see in the next image, the distribution for this variable deviates somewhat from the normal distribution. However, even though two counties (in the two right-most columns) have a higher percentage of families below the poverty level than what would be expected according to the normal distribution, they still seem to be sufficiently "within range."
This decision is somewhat subjective; a general rule is that one needs to be concerned if an observation (or observations) falls outside the mean ± 3 times the standard deviation. In that case, it is wise to repeat critical analyses with and without the outlier(s) to ensure that they did not seriously affect the pattern of intercorrelations.
Right-click on the cell (correlation) that intersects the POP_CHNG column and the PT_POOR row, and select Graphs of Input Data - Scatterplot by - Regression, 95% conf. from the resulting shortcut menu. A variable selection dialog box is displayed, with PT_POOR selected as the X variable. Select POP_CHNG as the Y variable, and click the OK button to produce the default scatterplot.
This scatterplot illustrates the substantial negative correlation (-.65) between the two variables. It also shows the 95% confidence limits for the regression line, that is, you can be 95% certain that the actual regression line in the population falls within the limits defined by the two curved, dashed lines.
The test for the POP_CHNG regression coefficient confirms that POP_CHNG is strongly related to PT_POOR, p<.001.
See also GLM - Index.