Workspace Node: GLZ Custom Design - Results - Residuals 1 Tab
In the GLZ Custom Design node dialog box, under the Results heading, select the Residuals 1 tab to access options to produce spreadsheets and plots of various basic predicted and residual statistics. For details regarding the computation and interpretation of these residual statistics, refer to McCullagh and Nelder, 1989.
Element Name | Description |
---|---|
Sample | |
Analysis, Cross-validation, Both | Select the respective option button under Sample to specify which type of sample to base the predicted and residual statistics. You can produce spreadsheets for all observations that were used to compute the current results (select Analysis), all observations that were not used to compute the current results, but have valid data for all predictor and dependent variables (select Cross-validation), or all observations in both the Analysis sample and the Cross-validation sample (select Both). If these options are not available, no cross-validation sample was specified on the Advanced tab. |
Basic Residuals | Select this check box to produce a spreadsheet with the raw residuals, Pearson residuals, and deviance residuals; scaled Pearson residuals and scaled deviance residuals are also computed for continuous distributions (of the dependent (response) variable). |
Predicted values | Select this check box to produce a spreadsheet with the observed and predicted values, linear predictor values, their standard errors, and the confidence intervals for the predicted values. Cases with observed values that are outside the respective confidence interval for the predicted values will be highlighted in the spreadsheet.
As described in the Computational Approach section of the GLZ Overviews, the relationship between predictors X1,..., Xk and (observed) response variable Y in the generalized linear model is assumed to be: Y = g(b0 + b1X1 + b2X2 + ... + bkXk) + e where b0, b1,..., bk are parameter estimates, e is the error, and g(...) is a known function. The items displayed in the spreadsheet can then be described as follows: Response value: The values of the response variable Y for each observation. Pred. value: Predicted values; the values of g(b0 + b1X1 + ... + bkXk) for each observation. Linear pred: Linear predictors; the values of b0 + b1X1 + ... + bkXk for each observation. Standard error: The estimates of the standard errors for the linear predictors for each observation. Lower CL: Lower confidence limits for the predicted values, for each observation. Upper CL: Upper confidence limits for the predicted values, for each observation. |
Std. Residuals | Select this check box to produce a spreadsheet with leverage values, studentized Pearson residuals, studentized deviance statistics, and likelihood residuals. |
Class & odds ratio | This option is only applicable (available) for classification-type analyses (with a categorical dependent variable). Select this check box to produce a spreadsheet containing the classification of cases and odds ratios. |
Other obs. Stats | Select this check box to produce a spreadsheet with generalized Cook's distances, differential chi-square values (measures the difference in the Pearson chi-square statistic due to removing the ith observation), and differential deviance residuals (measures the difference in the deviance statistic due to removing the ith observation). |
Lift chart | This option is only applicable (available) for classification-type analyses (with a categorical dependent variable), when the categorical dependent variable is binary in nature, i.e., only contains two discrete values. The lift chart provides a visual summary of the usefulness of the information provided by a statistical model for predicting a binomial (categorical) outcome variable (dependent variable). Specifically, the chart summarizes the gain that you can expect by using the respective predictive model compared to using baseline information only.
See lift charts for details regarding the interpretation. Refer also to the Rapid Deployment of Models module documentation for methods to produce overlaid (comparative) lift and gains charts for multiple predictive models and multinomial responses (with more than two categories). |
Conf. lev | Type in the value to be used for constructing confidence limits in the respective results spreadsheets or graphs; by default 95% confidence limits will be constructed. |
Plots of predicted and residual values | The options in this Plots of predicted and residual values group box are used to produce various plots of predicted and residual values. |
Pred. values. | Produces a histogram of the predicted values. |
Residuals | Produces a histogram of the raw residuals. |
Observ. values | Produces a histogram of the observed dependent (response) variable values. |
Pearson Resid | Produces a histogram of the Pearson residuals. |
P-plot of observ | Produces a normal probability plot of the observed dependent (response) variable values; this option is only available for continuous distributions of the dependent (response) variable. |
Pred. & resids. | Produces a scatterplot of the predicted values vs. the residuals. |
Observ. & pred. | Produces a scatterplot of the observed values vs. predicted values. |
Observ. & resids | Produces a scatterplot of the observed values vs. residuals. |
Res. & case no | Produces a scatterplot of the residuals vs. case numbers. |
P-plot of resids | Produces a normal probability plot of the raw residuals. |
Bin number | Specify the number of bins you want to have on your histogram plots. This option applies to the histogram plots available on this tab (see above). Note that STATISTICA will not always produce histograms with the exact number of bins that you specify. It will produce the closest number to the specified bins while still maintaining "neat" intervals. |
Aggregation | Select this check box to compute the predicted values (and related statistics, e.g., residuals) in terms of predicted frequencies. In models with categorical response variables, predicted values (and related statistics, e.g., residuals) can be computed in terms of the raw data or for aggregated frequency counts. For example, in the
Binomial case and for raw data, you can think of the response variable as having two possible values: 0 (zero) or 1. Accordingly, predicted values should be computed that fall in the range from 0 (zero) to 1 (e.g., classification probabilities). If the
Aggregation check box is set, then STATISTICA will consider the aggregated (tabulated) data set. In that case, you can think of the response variable as a frequency count, reflecting the number of observations that fall into the respective categories. This is easiest imagined in the case where the predictors are also categorical in nature: The resulting aggregated data file would simply be a multi-way frequency table.
Options / C / W. See Common Options. |
OK | Click the OK button to accept all the specifications made in the dialog box and to close it. The analysis results will be placed in the Reporting Documents node after running (updating) the project. |