Goodness of Fit Calculations Overview

Overview

The purpose of the Goodness of Fit module of STATISTICA (STATISTICA Data Miner) is to serve as a general tool for evaluating models for prediction of continuous dependent variables (see Dependent vs. Independent Variables) and for predictive classification. The program will compute various goodness-of-fit statistics based on observed and predicted values or classifications, and produce various summary graphs. The module can be used in conjunction with virtually all statistical procedures for building predictive models for continuous or categorical variables (regression and classification problems, respectively).

The program expects as input a variable containing observed values or classifications, and one or more variables containing the predicted values or classifications from one or more different models.

Goodness-Fit-Statistics

Various goodness-of-fit summary statistics can be computed for continuous and categorical dependent variables. Most of these statistics are discussed in greater detail in Witten and Frank (2000); in the context of forecasting; different statistics are also discussed in Makridakis and Wheelwright (1983).

Goodness of fit statistics for regression problems

For continuous variables, the program will compute:

Least squares deviation (LSD), mean square error
Average deviation, mean absolute error
Relative squared error, mean relative squared error
Correlation coefficient (Pearson product moment correlation)

See Computational Details for additional details.

Goodness of fit statistics for classification problems (for categorical variables)

For categorical variables, Statistica will compute:

Pearson Chi-square
G-square (maximum likelihood Chi-square)
Percent disagreement (misclassification rate)

See Computational Details for additional details.

Contents

Index

Search Results

Goodness of Fit Calculations Overview