Sample Options in GLZ
The options described here are available in the Sample group box on the GLZ Results - Summary tab.
- Analysis
- Select the Analysis option button to display spreadsheets for all observations that were used to compute the current results (the Analysis sample).
- Cross-validation
- Select the Cross-validation option button to display spreadsheets for all observations that were not used to compute the current results, but have valid data for all predictor and dependent variables (the Cross-validation sample).
- Both
- Select the Both option button to display spreadsheets for all observations in both the Analysis sample and the Cross-validation sample.
Note: if the above three option buttons are dimmed, no cross-validation sample was specified on the Quick Specs Dialog - Advanced tab, or via the Sample keyword in the Analysis Syntax Editor.
- Goodness of fit
- Click the Goodness of fit button to display a spreadsheet showing the Pearson Chi-square statistic, deviance statistic, scaled Pearson Chi-square statistic, scaled deviance statistic, log-likelihood value, AIC, and BIC for the current (overall) model (see also the
Introductory Overview for details). All of these statistics, except for the log-likelihood, AIC, and BIC, are asymptotically Chi-square distributed, so large values of the respective statistics (relative to the degrees of freedom; the ratios of the respective statistics over the degrees of freedom are displayed in the last column of the spreadsheet) imply that the model does not fit the data well. For models where the distribution is binomial, Cox-Snell R2, Nagelkerke R2, and Hosmer-Lemeshow test are also computed. Note that these results always pertain to the overall model with all effects, regardless of which effects were selected by any Model building procedures on the
Quick Specs Dialog - Advanced tab; see the Results for stepwise or best-subset regression note in the
GLZ Results topic for details.
Global null hypothesis tests are used to test that the parameter estimates are significantly different from zero. Each statistic is assumed to have an asymptotic chi-square distribution with p degrees of freedom given the null hypothesis.
Likelihood ratio test = 2*[log-likelihood for estimated model – log-likelihood for null model]
Score test = transposed gradient for null model * variance-covariance matrix for null model * gradient for null model
Wald test = transpose Parameter estimates vector for full model * Hessian matrix for full model * Parameter estimates vector for full model
- HL Groups
- Specify the number of groups, g, used in the computation of the Hosmer-Lemeshow goodness of fit test. STATISTICA will sort the predicted probabilities and try to create g groups of equal size.
- Aggregation
- Select the Aggregation check box to compute the predicted values (and related statistics, e.g., residuals) in terms of predicted frequencies. In models with categorical response variables, predicted values (and related statistics, e.g., residuals) can be computed in terms of the raw data or for aggregated frequency counts. For example, in the Binomial case (see Distribution and link function), and for raw data, you can think of the response variable as having two possible values: 0 (zero) or 1. Accordingly, predicted values should be computed that fall in the range from 0 (zero) to 1 (e.g., classification probabilities). If the Aggregation check box is selected, then STATISTICA will consider the aggregated (tabulated) data set. In that case, you can think of the response variable as a frequency count, reflecting the number of observations that fall into the respective categories. This is easiest imagined in the case where the predictors are also categorical in nature: The resulting aggregated data file would simply be a multi-way frequency table.
- Aggreg. data.
- Click the Aggreg. data button to review the aggregated data in a spreadsheet. In models with categorical response variables, predicted values (and related statistics, e.g., residuals) can be computed in terms of the raw data or for aggregated frequency counts. For example, in the Binomial case, and for raw data, you can think of the response variable as having two possible values: 0 (zero) or 1. Accordingly, predicted values should be computed that fall in the range from 0 (zero) to 1 (e.g., classification probabilities).
For example, suppose your data contain a binary dependent (response) variable; the raw (non-aggregated) data may look like this:
After aggregation, these data can be represented as follows:
where the values in the column labeled y1 are counts of the number of observations where y = 1, and y0 are counts of the number of observations where y = 0.
When you check the Aggregation check box, STATISTICA will convert the raw data into the aggregated representation, and by clicking on the Aggreg. data button, you can review the aggregated data in a spreadsheet. Remember that selecting the Aggregation check box will also affect the computation (and display) of predicted and residual values; see the description of the Aggregation check box (above) for details.
- Raw data
- Click the Raw data button to display a spreadsheet with the design matrix, values of the dependent (response) variable, case weights, values of the count variable (if one was selected, and the current distribution is Binomial, Multinomial, or Ordinal multinomial; see Specification dialogs and syntax and the Introductory Overview), and the values of the offset variable (if one was selected).
- Overdispersion
- In models with categorical response variables, the chosen distribution will be Poisson, Binomial, Multinomial, or Ordinal multinomial responses, see
Distribution and link function. In that case the default dispersion parameter (1.0) for the generalized linear/nonlinear model (i.e., for the exponential family of distributions) may not be adequate. You can select the Overdispersion check box and then select either the Pearson Chi2 or Deviance option button as the estimate of the dispersion parameter.
If you specify Deviance, the dispersion parameter is estimated by the deviance divided by its degrees of freedom. If you specify Pearson Chi2, the dispersion parameter is estimated by Pearson's chi-square statistic divided by its degrees of freedom. The adjustment is reflected in scale parameter as it is proportional to dispersion parameter.
Changing the overdispersion parameter will affect the computation (values) of the parameter variances and covariances and the model likelihood, and all related statistics (e.g., standard errors, prediction errors, etc.). For details, refer to McCullagh and Nelder, 1989.
See also, GLZ - Index.