Best-Subset and Stepwise GDA ANCOVA

Best-subset and stepwise discriminant function analysis with categorical factor effects; builds a linear discriminant function model for continuous and categorical predictor variables, using ANCOVA-like designs. By default, only main effects will be evaluated for categorical predictors; you can also construct factorial designs up to a certain degree (e.g., to degree 3, to include all 2-way and 3-way interactions of categorical predictors). Note that the algorithm for stepwise and best subset selection of categorical factor effects ensures that complete (possibly multiple-degrees-of-freedom) effects are moved into and out of the model. The General Discriminant Analysis module provides functionality that makes this technique a general tool for classification and data mining. However, most - if not all - textbook treatments of discriminant function analysis are limited to simple and stepwise analyses with single degree of freedom continuous predictors. No 'experience' (in the literature) exists regarding issues of robustness and effectiveness of these techniques, when they are generalized in the manner provided in this very powerful module. The use of best-subset methods, in particular when used in conjunction with categorical predictors, should be considered a heuristic search method, rather than a statistical analysis technique.

General

Element Name	Description
Model building method	Specifies a model building method.
Detail of computed results reported	Detail of computed results; if Minimal level of detail is requested, the output contains Chi-square tests of roots, discriminant (canonical) function coefficients, factor structure coefficients, and classification function coefficients. If All results is requested, Statistica will also report various descriptive statistics and classification summary statistics. Classification statistics for each case can be requested separately as an option.
Construct factorial to degree	Specifies the factorial degree of the design to be tested; Statistica will construct an ANCOVA-like factorial design for all categorical predictors up to the specified degree (i.e., by default up to degree 1, so that the final model will include only main effects for categorical predictors; if you set this parameter to 2, then all two-way interactions will also be included, and so on).
Priors	Set the prior classification probabilities for classifying observations. The default specification is Estimated; use this option to set the prior classification probabilities proportional to the observed group (class) N's; use the Equal option to assign equal probabilities to each group or class specified in the categorical dependent variable.
Case statistics	Creates and reports selected case statistics.
Sweep delta 1.E-	Specifies the negative exponent for a base-10 constant Delta (delta = 10^-sdelta); the default value is 7. Delta is used (1) in sweeping, to detect redundant columns in the design matrix, and (2) for evaluating the estimability of hypotheses; specifically a value of 2*delta is used for the estimability check.
Inverse delta 1.E-	Specifies the negative exponent for a base-10 constant Delta (delta = 10^-idelta); the default value is 12. Delta is used to check for matrix singularity in matrix inversion calculations.
Generates data source, if N for input less than	Generates a data source for further analyses with other Data Miner nodes if the input data source has fewer than k observations, as specified in this edit field; note that parameter k (number of observations) will be evaluated against the number of observations in the input data source, not the number of valid or selected observations.

Parameters for Stepwise Selection

Element Name	Description
Stepwise selection criterion	Specifies the criterion to use for stepwise selection of predictors. Note that the F statistic is only available for designs that do not include categorical factor effects.
p to enter	Specifies p-to-enter for stepwise selection of predictors.
p to remove	Specifies p-to-remove for stepwise selection of predictors.
F to enter	Specifies F-to-enter for stepwise selection of predictors (available for single continuous dependent variables only). Note that the F statistic is only available for designs that do not include categorical factor effects.
F to remove	Specifies F-to-remove for stepwise selection of predictors (available for single continuous dependent variables only). Note that the F statistic is only available for designs that do not include categorical factor effects.
Maximum number of steps	Specifies the maximum number of steps for stepwise selection of variables.

Parameters for Best-Subset Selection

Element Name	Description
Best subsets measure	Specifies the selection criterion for best subset selection of predictors. To use cross-validation misclassification rates, a cross-validation variable (learning sample) must be specified.
Start for best subsets	Specifies the smallest number of predictors to be included in the model chosen via best subset selection, i.e., the start of the search for the best subset of predictors.
Stop for best subsets	Specifies the maximum number of predictors to be included in the model chosen via best subset selection.
Number of subsets to display	Specifies the number of subsets to display in the results; Statistica will keep a log of the best k predictor models of any given size, using k as specified by this parameter.
Number of variables to force	Specifies the number of predictors to force into the model, i.e., to select into all models considered during the best-subset selection of predictors. STATISTICA will force the first k predictors in the list of continuous predictors into the model, with k as specified here by you.

Deployment

Deployment is available if the Statistica installation is licensed for this feature.

Element Name	Description
Generates C/C++ code	Generates C/C++ code for deployment of predictive model.
Generates SVB code	Generates Statistica Visual Basic code for deployment of predictive model.
Generates PMML code	Generates PMML (Predictive Models Markup Language) code for deployment of predictive model. This code can be used via the Rapid Deployment options to efficiently compute predictions for (score) large data sets.
Saves C/C++ code	Save C/C++ code for deployment of predictive model.
File name for C/C code	Specify the name and location of the file where to save the (C/C++) deployment code information.
Saves SVB code	Save Statistica Visual Basic code for deployment of predictive model.
File name for SVB code	Specify the name and location of the file where to save the (SVB/VB) deployment code information.
Saves PMML code	Saves PMML (Predictive Models Markup Language) code for deployment of predictive model. This code can be used via the Rapid Deployment options to efficiently compute predictions for (score) large data sets.
File name for PMML (XML) code	Specify the name and location of the file where to save the (PMML/XML) deployment code information.

Contents

Index

Search Results

Best-Subset and Stepwise GDA ANCOVA

General

Parameters for Stepwise Selection

Parameters for Best-Subset Selection

Deployment