Feature Selection and Variable Screening

The program will perform the following operations:

 A. Single list of dependent variables:

 1. If a single variable list is specified for the analysis the program will automatically divide (block, chunk) the variable list into a number of mapped InputDescriptors of manageable size for downstream analyses. You can then, for example, attach the multiple output documents that are produced to compute descriptive statistics for all (many thousands of) variables.
 2. The user can request that the program select randomly a specified number of variables for downstream analyses.

 B. Feature Selection for Regression and/or Classification Problems:

 The program will use the Statistica Feature Selection and Variable Screening module to select a subset of k best predictors for each variable.

Single List

Element Name Description
Number of vars for subset Select the number of variables to be selected for the subset.
Method for single var. list Select the method for reducing a single list of categorical and/or continuous dependent variables. The following methods are available:

 Divide into several outputs: Choose this method to create multiple data sources for downstream analyses; this option is useful, for example, to compute descriptive statistics for many (thousands of) variables.

 Random selection of subsets: Randomly select the requested number of variables from the list of continuous and categorical dependent variables.
Max. size of a variable list Select the maximum number of variables for a single analysis during variable selection; this option is only applicable if the 'Method for single variable list' parameter is set to 'Divide into several outputs'; this parameter determines the largest size of a single variable list created by this option.

 Most nodes of Statistica Data Miner will process multiple data sources, so the 'Divide into several outputs' option allows you to perform identical analyses on extremely large lists of variables.

Feature/Predictor Selection

Element Name Description
Number of vars for subset Select the number of predictor variables to be selected for the subset; the program will select the requested number of variables for each dependent variable in the continuous and categorical dependent variable list.
Number of cuts Select the number of cuts for continuous predictor variables; the algorithm for selection of continuous predictor variables will divide the range of values for each variable into k intervals; select 2 if you only want to test for monotone relationships; select 10 to test for all types (linear, monotone, non-monotone, etc.) relationships.