Predictor Screening Introductory Overview

Overview

The term curse of dimensionality (Bellman, 1961, Bishop, 1995) generally refers to the difficulties involved in fitting models, estimating parameters, or optimizing a function in many dimensions.

As the dimensionality of the input data space ( the number of predictors) increases, it becomes more difficult to find global optima for the parameter space (for instance, to fit models). In practice, the complexity of certain models, for example, neural networks, becomes unmanageable when the number of inputs into the model exceeds a few hundred or even less. Hence, it is simply a practical necessity to screen and select from among a large set of input (predictor) variables those that are of likely utility for predicting the outputs (dependent variables) of interest.

The purpose of the Predictor Screening module is to select a set of predictor variables from a large list of candidates allowing you to focus on a more manageable set for further analysis. The Predictor Screening module optimally bins continuous and categorical predictors and estimates their predictive power associated with the outcome or dependent variable of interest.  

The module constructs the optimal bins using C&RT decision tree methods. Predictive power assessment is dependent upon the nature of the response variable as briefly described below.

Continuous dependent variables

For regression problems, Statistica will compute the coefficient of determination and the F-value for each predictor variable.

Categorical dependent variables

For classification problems, Statistica  will compute a Chi-square statistic and p-value for each predictor variable.

Binary dependent variables

For binary response variables, Statistica will compute the weight of evidence and information value for each predictor variable.