Using a Model Summary


A model summary is automatically created when running a regression modeling or a classification modeling. The model summary displays the name of the model, the model type, and the model formula.

For parametric models (Linear Regression and Logistic Regression), additional summary statistics, appropriate for the particular model type are also shown. These statistics can give an indication of how well the model fits the data and can also be used to compare one model with another model of the same type.

For tree models, a text description of the tree structure is displayed, followed by table showing the model improvement at each split. Finally, a summary of each individual split, starting at the root node, is shown.

 

Icon

Description

prd_refit_model_i.png

[Only visible if there has been any changes to the underlying data table, e.g., through filtering.]

Refits the model.

prd_edit_i.png

Opens the Regression Modeling or the Classification Modeling dialog where you can make changes to your current model.

prd_evaluate_model_i.png

Opens the Evaluate Model dialog where you can test the model against another set of data in order to see how well the model fits other data.

prd_predict_i.png

Opens the Insert Predicted Columns dialog where you can use the model to insert predicted columns into a data table.

prd_duplicate_model_i.png

Opens the Duplicate Model dialog where you can type a new name and create a duplicate of the model. This allows you to edit a copy rather than the original model if you need to make adjustments to a model.

Summary Statistics

Regression Model

Description

Residual standard error

The residual standard error is a measure of the error variability after the effects of the predictors used in the model are accounted for. It is in the units of the response variable.

R-squared (or R2) and Adjusted R-squared

R-squared measures the fraction of the variability in the data that is explained by the model. It is a number between 0 and 1 with 1 being a perfect fit model (all observations are predicted exactly). The Adjusted R-squared is like R-squared with an adjustment to account for the number of predictors in the model. Adding predictors, even nonsense ones, will increase R-squared.

F-statistic

The F-statistic is a statistics test of the overall significance of the model. A statistically significant model will have a small p-value (typically less than 0.05). The DF or degrees-of-freedom are associated with the F-statistic and are used to compute the p-value.

 

Classification Model

Description

Null deviance and Residual deviance

The deviance is another measure of the variability in the model. The null deviance is for the model with no predictors and the residual deviance is the measure after the effects of the predictors used in the model are accounted for.

AIC

AIC is the Akaike Information Criterion, a measure of the goodness of fit of the model. Like the adjusted R-squared for regression model, it takes into account the number of predictors included in the model. For the same response variable and different combinations of predictor variables, the model with the smallest AIC would be preferred.

CP nsplit rel_error xerror xstd

This table shows a collection of optimal pruned trees based on the value of the complexity parameter. For each tree, the complexity value (CP), the number of splits, (the number of nodes in the tree is 1 + nsplit), the relative error (rel_error) as well as the cross-validated error (xerror) and the standard error of the cross-validated error (xstd). The error columns are scaled so that the first node has an error of 1. The complexity value is also scaled.

See also:

Using a Table of Coefficients

Available Diagnostic Visualizations