Feature & Model Selection; Classification

Double-click Feature and method selection: Classification on the Feature & Method Selection Startup Panel - Quick tab to display the Feature & Model Selection; Classification dialog box. You also can select Feature and method selection: Classification and click the OK button to display this dialog box, which contains two tabs: Quick and Advanced.

In addition to feature selection methods based on those described in detail in the documentation of the Feature Selection and Variable Screening module, STATISTICA Process Optimization can also simultaneously evaluate the importance of predictors and the particular types or classes of models that are likely to identify the respective predictors. Specifically, the program can automatically fit linear models [stepwise regression or stepwise linear discriminant function analysis; see General Regression Models (GRM) and General Discriminant Analysis (GDA)], classification and regression trees [see General Classification and Regression Trees (GC&RT)], Multivariate Adaptive Regression Splines (MARSplines), boosted tree models (see Boosted Trees; this is an implementation of stochastic gradient boosting), and various neural network architectures. Next, Process Optimization will rank the importance of each predictor in each type of model, and report those rankings as well as the average importance ranking over all models. Optionally, the program can also report a ranking of models based on simple summary statistics (R-square, misclassification rate) for each model. In summary, these options enable you to evaluate the importance of various predictors in different types of statistical or machine learning models, as well as the type or class of model that is best suited for the analyses of those predictors. In short, Process Optimization will identify the important predictors and the best method to use in order to identify those predictors.

Summary
Click the Summary button to compute the results; if no variables have been specified yet, the program will display the standard three-variable selection dialog box where you can select the categorical (class) dependent (outcome) and continuous and/or categorical predictor variables for the analyses; the analyses will be performed for each dependent variable one at a time.

If variables have already been selected, then clicking the Summary button will begin the analyses and display the requested results. Use the Models options on the Advanced tab to specify which types of models to include (evaluate) in the analyses; use the Reports options to select which results to report.

Cancel
Click the Cancel button to close the dialog box without performing an analysis.
Options
Click the Options button to display the Options menu.
Open Data
Click the Open Data button to display the Select Data Source dialog box, which is used to choose the spreadsheet on which to perform the analysis. The Select Data Source dialog box contains a list of the spreadsheets that are currently active.
SELECT CASES
Click the Select Cases button to display the Analysis/Graph Case Selection Conditions dialog box, which is used to create conditions for which cases will be included (or excluded) in the current analysis. More information is available in the Case Selection Conditions Overview, Case Selection Conditions Syntax Summary, and Case Selection Conditions dialog box description.
W
Click the W (Weight) button to display the Analysis/Graph Case Weights dialog box, which is used to adjust the contribution of individual cases to the outcome of the current analysis by "weighting" those cases in proportion to the values of a selected variable.
MD casewise deletion
If the casewise deletion of missing data is selected, STATISTICA will ignore in the analyses all cases (observations) that have missing data for any of the variables selected for the analyses. Selecting casewise deletion of missing data may not be desirable when many predictor variables are specified, each with many missing data. In that case it can happen that practically all observations are excluded from the analyses. An alternative strategy is to first perform an initial feature (variable) selection without selecting this option, followed by a second analysis with casewise deletion of missing data for only those variables selected in the first pass.