Goodness of Fit Calculations Overview

Overview
The purpose of the Goodness of Fit module of STATISTICA (STATISTICA Data Miner) is to serve as a general tool for evaluating models for prediction of continuous dependent variables (see Dependent vs. Independent Variables) and for predictive classification. The program will compute various goodness-of-fit statistics based on observed and predicted values or classifications, and produce various summary graphs. The module can be used in conjunction with virtually all statistical procedures for building predictive models for continuous or categorical variables (regression and classification problems, respectively).

The program expects as input a variable containing observed values or classifications, and one or more variables containing the predicted values or classifications from one or more different models.

Goodness-Fit-Statistics
Various goodness-of-fit summary statistics can be computed for continuous and categorical dependent variables. Most of these statistics are discussed in greater detail in Witten and Frank (2000); in the context of forecasting; different statistics are also discussed in Makridakis and Wheelwright (1983).
Goodness of fit statistics for regression problems
For continuous variables, the program will compute:
  • Least squares deviation (LSD), mean square error
  • Average deviation, mean absolute error
  • Relative squared error, mean relative squared error
  • Correlation coefficient (Pearson product moment correlation)

See Computational Details for additional details.

Goodness of fit statistics for classification problems (for categorical variables)
For categorical variables, Statistica will compute:
  • Pearson Chi-square
  • G-square (maximum likelihood Chi-square)
  • Percent disagreement (misclassification rate)

See Computational Details for additional details.