Feature Selection and Variable Screening and Quick Tab
Ribbon bar. Select the Data Mining tab. In the Tools group, click Feature Selection, and from the menu, select Feature Selection to display the Feature Selection and Variable Screening dialog box.
Classic menus. On the Data Mining menu, select Feature Selection and Variable Screening to display the Feature Selection and Variable Screening dialog box.
The dialog box contains one tab: Quick. Use these options to select all variables that are to be included in the analysis. There are no limits on the number of predictor variables that can be specified. Note that separate results are computed for each dependent variable. Hence, memory limitations may be encountered when processing large lists of dependent and independent variables in a single analysis, in particular when there are many categorical predictors with many categories. In that case, perform the feature selection analysis one dependent variable at a time.
See also the Feature Selection and Variable Screening Index and Overviews.
- Variables
- Click the Variables button to display the standard variable selection dialog box. As in Statistica Data Miner, you can select continuous and categorical dependent variables, and continuous and categorical predictors. Continuous variables are those that are measured on some continuous scale (e.g., Height, Weight), categorical variables contain indicators, codes, or text values to denote membership in some group or class (e.g., Gender: Male, Female). Different statistics are computed for categorical dependent variables (for classification) and continuous dependent variables (for regression-type problems); refer also to Computational Details.
- Count Variable
- Click this button to display a standard variable selection dialog box where you can specify a count variable (just as in Statistica Data Mining). The values in this variable are used during the computations as a simple case multiplier.
- Number of cuts for continuous predictors
- This option is used to specify the "coarseness of the grid" applied to the continuous predictors. As described in Computational Details, the program divides the range of values for each continuous predictor into k intervals, and computes statistics based on the means or frequencies in those intervals for regression or classification-type problems, respectively. Therefore, if you specify k=2 in this edit field, the range of values for each continuous predictor variable are split into 2 categories, and the variable screening only detects simple monotone (e.g., linear) relationships to the dependent variables. If you specify k=3, simple monotone and non-monotone (e.g., quadratic) relationships are picked up as well; the default value (10) is well suited to perform screening for practically all types of monotone or complex non-monotone relationships, and hence does not bias the selection of variables in favor of any particular subsequent analysis that may be applied for predictive data mining.
- Treat continuous dependent variable as categorical
- Set this option to treat the continuous dependent variables in the analyses as categorical variables. When this option is selected, the program first "bins" the values of the dependent variables into the specified number of categories, and then performs the computations for categorical dependent variables as described in Computational Details. This option is recommended when the continuous dependent variables are highly skewed or contain extreme outliers; such outliers may otherwise cause the program to select those predictors that most clearly separate the outliers from the remaining (non-outlier) data points. The process of binning the values in the continuous dependent variables into categories automatically "pulls in" the outliers and diminishes their overall influence on the analyses.
- Casewise deletion of missing data
- If the Casewise deletion of missing data check box is selected, Statistica ignores all cases (observations) in the analyses that have missing data for any of the variables selected for the analyses. Selecting casewise deletion of missing data may not be desirable when many predictor variables are specified, each with many missing data. In that case it can happen that practically all observations are excluded from the analyses. An alternative strategy is to first perform an initial feature (variable) selection without selecting this option, followed by a second analysis with casewise deletion of missing data for only those variables selected in the first pass.
- OK
- Click the OK button to start the computations and to display the FSL Results dialog box. If no variables have been specified when this button is clicked, the variable selection dialog box is displayed before data processing starts.
- Cancel
- Click the Cancel button to close the dialog box without performing an analysis.
- Options
- See Options Menu for descriptions of the commands on this menu.
- Open Data
- Click the Open Data button to display the Select Data Source dialog box, which is used to choose the spreadsheet on which to perform the analysis. The Select Data Source dialog box contains a list of the spreadsheets that are currently active.
- Select Cases
- Click the Select Cases button to display the Analysis/Graph Case Selection Conditions dialog box, which is used to create conditions for which cases will be included (or excluded) in the current analysis. More information is available in the case selection conditions overview and syntax summary.
- W
- Click the W (Weight) button to display the Analysis/Graph Case Weights dialog box, which is used to adjust the contribution of individual cases to the outcome of the current analysis by weighting those cases in proportion to the values of a selected variable. Note that Statistica Feature Selection and Variable Screening accepts fractional weights, and treats them as multipliers in the analyses.