Feature Selection and Root Cause Analysis

The program will use the Statistica Feature Selection and Variable Screening module to select a subset of k best predictors for each variable. The search will be performed predictor-by-predictor, i.e., this is node will perform a fast first-order (no-interactions considered) search of potential predictors of the outcome variable(s) of interest.

Feature/Predictor Selection

Element Name Description
Detail of computed results reported Specifies the level of computed results reported; if All results are requested, the final list of selected predictors is also reported as a text string of variable numbers (which can then be copied and pasted to other analyses).
Predictor selection Select either a fixed number of predictors, or select only predictors with p < userdefined value.
Number of vars for subset Selects the number of predictor variables to be selected for the subset; this option is only applicable if the Predictor selection option is set to Fixed number. The program will select the requested number of variables for each dependent variable in the continuous and categorical dependent variable list.
P for predictor selection Specifies a p-value for selecting predictors; the program will report results only for those predictors which are related to the dependent variables with p<value; note that the p-value should be interpreted with caution, as the customary interpretation of p as an alpha error rate is not appropriate in this context (where multiple post-hoc statistical tests are performed).
Number of cuts Select the number of cuts for continuous predictor variables; the algorithm for selection of continuous predictor variables will divide the range of values for each variable into k intervals; select 2 if you only want to test for monotone relationships; select 10 to test for all types (linear, monotone, non-monotone, etc.) relationships.
Casewise MD deletion Applies casewise deletion of missing data (MD); if TRUE, cases will be excluded if they have any missing data for any predictor.
Treat continuous dependents as categorical Set this option to treat the continuous dependent variables in the analyses as categorical variables. When this option is TRUE, the program will first 'bin' the values of the dependent variables into the specified number of categories, and then perform the computations as for categorical dependent variables. This option is recommended when the continuous dependent variables are highly skewed or contain extreme outliers; such outliers may otherwise cause the program to select those predictors that most clearly separate the outliers from the remaining (non-outlier) data points. The process of binning the values in the continuous dependent variables into categories will automatically 'pull-in' the outliers, and diminish their overall influence on the analyses.
Number of cuts for continuous dependents Select the number of cuts for continuous dependent variables; the algorithm for selection of continuous dependent variables will divide the range of values for each variable into k intervals; select 2 if you only want to test for monotone relationships; select 10 to test for all types (linear, monotone, non-monotone, etc.) relationships.