Feature Selection and Variable Screening - Computational Details

The algorithm for feature selection and variable screening can be applied to regression-type problems (continuous dependent variable) as well as classification-type problems (categorical dependent variable).

Continuous Dependent Variables

By default, for continuous dependent variables, Statistica will compute the ratio of the between category variance to within category variance (of the dependent variable) for intervals of predictor variables determined depending on their nature (continuous vs. categorical). Optionally, continuous dependent variables can also be "binned" and "converted"  to categorical variables for Feature Selection analyses; in that case, the analyses proceed as described for categorical dependent variables, described below. This option is particularly useful when dealing with highly skewed continuous dependent variables, or when that variable contains extremely outlier values.

For continuous predictors, the program will divide the range of values in each predictor into k intervals (10 intervals by default; to "fine-tune" the sensitivity of the algorithm to different types of monotone and/or non-monotone relationships, this value can be changed by the user in the Feature Selection and Variable Screening Startup Panel). The continuous variables are recoded with an algorithm that uses equal percentile binning. Categorical predictors will not be transformed in any way. Options are available in the FSL Results dialog box to sort the list of F and p-values representing each predictor, to review the best predictors using either the F or p-value as the criterion of predictor importance.

Categorical Dependent Variables

For classification problems (or regression problems, when the continuous dependent variables are optionally "binned" and hence turned into categorical variables for the analyses), the program will compute a Chi-square statistic and p-value for each predictor variable. For continuous predictors, the program will divide the range of values in each predictor into k intervals (10 intervals by default;  to "fine-tune" the sensitivity of the algorithm to different types of monotone and/or non-monotone relationships, this value can be changed by the user on the Feature Selection and Variable Screening Startup Panel). Categorical predictors will not be transformed in any way. Options are available in the FSL Results dialog box to sort the list of Chi-square and p-values representing each predictor, to review the best predictors using either the Chi-square or p-value as the criterion of predictor importance.