Best Subset Regression

Click the Best subs button on the General Discriminant Analysis (GDA) Models Syntax Editor - Keywords tab to display the Best Subset Regression dialog. This dialog contains keywords and specifications (options) for specifying a best subset analysis; see the description of the GDA Syntax for a list of all keywords (see also the Syntax Editor dialog). Click on a button to insert the respective keyword, option, symbol, or value at the current cursor location in the Analysis syntax edit field on the General Discriminant Analysis (GDA) Models Syntax Editor dialog.

Keywords

The Keywords group box contains the following keywords:

BESTCRIT

BESTCRIT [=] { LAMBDA };

{ MISCLASS };

{ CROSSVAL };

Example. BESTCRIT = LAMBDA;

Optional keyword used in conjunction with the MBUILD option bestsubset. Specify the criterion that is to be used for comparing models (with different subsets of effects) during the best subset regression computations. The default specification is LAMBDA (use the ordinary Wilks' Lambda value); use MISCLASS to base the selection of predictor effects on the misclassification rate of cases in the sample from which the parameter estimates were computed; use CR0SSVAL to base the selection of predictor effects on the misclassification rate for cases in a cross-validation sample, i.e., the misclassification of cases which were not included in the estimation of the parameters (see below for further details). For more information on model building methods, see the Introductory Overview section.

Applies to. GDA

START

START [=] Integer Value;

Example. START = 6;

Optional keyword used in conjunction with the MBUILD option bestsubset; see also the STOP keyword below. The START and STOP values determine the sizes of the subsets that will be considered during the search through all possible subsets. The program will begin the search with the subset size specified with the START keyword, and will terminate the search after all subsets of the size specified via STOP have been evaluated.

Applies to. GRM, GDA

STOP

STOP [=] Integer Value;

Example. STOP = 10;

Optional keyword used in conjunction with the MBUILD option bestsubset; see also the START keyword above. The START and STOP values determine the sizes of the subsets that will be considered during the search through all possible subsets. The program will begin the search with the subset size specified with the START keyword, and will terminate the search after all subsets of the size specified via STOP have been evaluated.

Applies to. GRM, GDA

MAXSUB

MAXSUB [=] Integer Value;

Example. MAXSUB = 10;

Optional keyword used in conjunction with the MBUILD option bestsubset. The integer value specified with this keyword will determine the number of subsets (of each size if the RSQUARED option has been specified with the BESTCRIT command) that will be displayed on the Results dialog, in the Summary of best subset search spreadsheet. For example, if you specify MAXSUB=12, then you can later review (via option Summary of best subset search on the Results dialog) the 12 best subsets according to the chosen criterion (see Lambda, Misclass, and Crossval below). The default value is 10.

Applies to. GRM, GDA

Specifications

The Specifications group box contains specifications (options) that can be used in the analysis syntax. Note that the specifications discussed below are the ones that are unique to best subset regression. For details about the other specifications on this dialog, see GDA Models Syntax Editor - Specifications tab.

Lambda

Click the lambda button to insert the specification lambda into the Analysis syntax edit field (on the GDA Models Syntax Editor dialog), at the current location of the cursor. This option is used in conjunction with the BESTCRIT keyword (see also MBUILD option bestsubset) to specify that the ordinary Wilks' Lambda value is to be used when comparing the subsets of effects during best subset regression. The Wilks' Lambda statistic for the overall discrimination is computed as the ratio of the determinant (det) of the within-groups variance/covariance matrix over the determinant of the total variance covariance matrix:

Wilks' Lambda = det(W)/det(T)

The F approximation to Wilks' Lambda is computed following Rao (1951).

Misclass

Click the misclass button to insert the specification misclass into the Analysis syntax edit field, at the current location of the cursor. This option is used in conjunction with the BESTCRIT keyword (see also MBUILD option bestsubset) to specify that the misclassification error rate value of analysis (training or learning) sample data is to be used when comparing the subsets of effects during best subset analyses; the misclassification error rate is computed as the number of misclassified observations divided by the total number of observations.

Crossval

Click the crossval button to insert the specification crossval into the Analysis syntax edit field, at the current location of the cursor. This option is used in conjunction with the BESTCRIT keyword (see also MBUILD option bestsubset) to specify that the misclassification error rate of cross-validation (test) sample data is to be used when comparing the subsets of effects during best subset regression.

For more information on model building methods, see Model Building in GRM in the Introductory Overview.

Contents

Index

Search Results

Best Subset Regression