Stepwise Model Builder - Logistic Regression
Ribbon bar. In Statistica, open a data set. Then, select the Statistics tab. In the Advanced/Multivariate group, click Advanced Models, and select Stepwise Model Builder to display the Stepwise Model Builder - Logistic Regression Startup Panel.
Classic menus. Open a data set. On the Statistics - Advanced Linear/Nonlinear Models submenu, select Stepwise Model Builder to display the Stepwise Model Builder - Logistic Regression Startup Panel.
Overview and Workflow
Use the options in the Stepwise Model Builder - Logistic Regression Startup Panel to compute the marginal predictor statistics given a current model; specifically, the variables listed in the Marginal Results Table will be entered one at a time into the logistic regression containing the predictors listed in the Model Results Table, so that analysts can evaluate the unique contribution of each predictor candidate not in the equation.
Then, select two or more continuous or categorical (discrete) predictor candidates (X variables).
Next, select (highlight) the predictor candidates in the Marginal Analysis Variables pane, and click either the Full sample button or the Subsample button to compute the Marginal results table results.
Then select (highlight) the predictors in the Marginal Results Table that are to be entered into the full Model Results Table, and click the Add variable button to:
1) Estimate the parameters of the logistic regression model including the selected predictors and any predictors previously entered into the regression equation, and
2) Re-estimate the results for all predictor candidates in the Marginal Results Table
Estimates from Subsample, Full sample. You can estimate the marginal analysis results (including those accessible via the options in the Marginal Results group box) using either a subsample of observations or all observations; you can also select (highlight) one or more predictors in the Marginal Results Table and re-estimate the respective marginal results for the Full sample or Subsample. The Sample column in the Marginal Results Table indicates whether the currently displayed results were computed from a random Subsample or the Full sample. When the results in the Marginal Results Table are re-estimated after you choose to Add variable or Remove variable from the Model Results Table, the same sample and sample option is used as before (Subsample or Full sample; note that a new Subsample is drawn only when the Subsample button is clicked).
The Model Results Table statistics are always computed for the entire data set.
This means that you cannot separate the set of discrete values available in a discrete predictor candidate, and all values will always be added or removed from models and selections in unison, even if only a single code is selected (highlighted) when the respective predictor is moved in/out of a results table.
When discrete predictors are removed from the prediction equation or the Marginal Results Table, the same logic applies.
Deploying models to Enterprise, Decisioning Flows. Click the Deploy model button (in the Model group box at the top-right of the Startup Panel) to save models to Enterprise. These models can then be referenced in Decisioning Flows for scoring or conditional scoring using the Rules node.
Option Descriptions
Select variables. Click this button to display a standard variable selection dialog. Select a discrete (categorical) Dependent or Y variable with at least two codes (binary, for example, credit default), and two or more Continuous and/or Categorical Predictors (predictor candidates). After exiting the variable selection dialog (click the OK button), the respective variable names will be transferred into the Marginal Analysis Variables pane.
Marginal Analysis Variables. After selecting variables for the analyses, this pane will show the selected variable names, their type (Continuous or Categorical), and the variable number in the input file. You can select one or more predictors in this list by highlighting them; to run marginal analyses on the selected variables, click either the Full sample or Subsample button in the Run Marginal Analysis group box.
Set Logistic Parameters.
Validation Sample. Optionally, you can select a variable containing codes to identify a validation sample; double-click the top box to display a variable selection dialog displaying the variables for the analysis. Then, select a Code that identifies the observations that are to be held out for the validation sample, that is, from which to recompute the model analysis results when using the Validation option (button located in the Model Analysis group box). Note that the Validation button will always display the recomputed (from the validation sample) results in results spreadsheets; the results shown in the Model Results Table displayed in the Stepwise Model Builder are always computed for all cases.
Dependent (Y) Variable. As described in the Introductory Overview, the program will compute parameter estimates and other results for a logistic regression. Select here the two (binary) codes that are to be used for the analyses, for the dependent or Y variable.
Project. Use the Save project and Open project options to save work in progress and to retrieve previously saved projects to continue working with the same variables and model.
Model.
Run Marginal Analysis. Click either the Full sample or Subsample button to add the selected (highlighted) predictor candidates from the Marginal Analysis Variables pane to the Marginal Results Table, and to compute the respective marginal analysis results.
Marginal Results Table. The Marginal Results Table will display the marginal analysis results for the currently selected predictor candidates. Right-click any column header in the Marginal Results Table to display a shortcut menu containing check boxes adjacent to available statistics; when a check box is selected, the respective column is added to the table; when a check box is cleared, the respective column is hidden in the table.
Notes:
Sorting the variable list. Click the columns in the Marginal Results Table to sort the table by the respective column values in ascending or descending order.
Selecting predictor candidates in the Marginal Results Table. To select predictor candidates in the Marginal Results Table, click the respective predictor candidate. Use CTRL+click or SHIFT+click to select specific predictor candidates or lists of contiguous predictor candidates, respectively.
Re-calculating marginal results. Click the Full sample or Subsample button to recalculate the marginal analysis results for the highlighted (selected) predictor variables.
Adding variables to the Model Results Table. Click the Add variable button to move selected (highlighted) predictors into the model and to update the Marginal Results Table.
Marginal Results.
Add/Remove Model Variables.
The results in the Model Results Table are always computed for the full sample. After the model parameters are updated, the Marginal Results Table results are then recalculated for all predictor candidates currently not in the model.
Model Results Table. The Model Results Table shows the parameter estimates and summary statistics for the current model, that is, the model with the predictors listed in the pane and computed from the full sample. Right-click any column header in the Model Results Table to display a shortcut menu containing check boxes adjacent to available statistics; when a check box is selected, the respective column is added to the table; when a check box is cleared, the respective column is hidden in the table.
Note:
Removing variables. To remove predictors from the current model, highlight the respective predictors and then click the Remove variable button. The predictors will be removed from the model, the model will be re-estimated with the remaining predictors, and the Marginal Results Table will be recalculated for all predictor candidates including those that were removed from the model.
Model Analysis.
- A lift chart for the prediction of the dependent (Y) variable
- Normal probability plot of residuals
- ROC curve and values/statistics
These results are always computed for the current model (with predictors listed in the Model Results Table) and the full sample.
k replications. The number of bootstrap replications
p% in holdout sample. The proportion of hold-out cases for estimating predictive accuracy [and the predicted percent-1s (default rate), as well as the difference between the predicted and observed percent-1s] in the holdout sample in each bootstrap replication; if this value is 0 (zero), predictive accuracy is computed from the (100%) training sample (used for estimating the parameters)
The program will then create k replications of the data via random sampling with replacement, and designate a proportion p cases in each replication as the hold-out or testing sample. Next, the respective model will be fit to all cases not in the hold-out sample (in the training sample) in the respective replication. The parameter estimates and percent (rate) of 1s (for example, percent Default) observed and predicted in the hold-out sample in each replication are also recorded. Thus, you can then evaluate the distribution of the parameter estimates and percent-of-1s observed and predicted over the replications.
If no hold-out sample is designated (Proportion of cases for holdout (p)=0), the respective percent-of-1s observed and predicted will always be computed from the training sample, that is, from all observations in each of the k replications.
Cut-off probability for classification p. Several results graphs, as well as the Bootstrap option, will compute predicted classifications based on prediction probabilities computed by the model. This parameter determines the cut-off probability for classifying a case as 1 (Bad code); that is, if the predicted probability of Bad for an observation is greater than p specified here, the respective observation will be classified as 1 or Bad.
Model Results. Use these options to compute various model statistics and summaries for the current model with predictors listed in the Model Results Table, and computed from the full sample.