ITrees Results - Manager Tab
Select the Manager tab of the ITrees Results dialog box to access numerous options for growing and pruning the tree, reviewing the tree, and reviewing the nodes of the current tree.
Element Name | Description |
---|---|
Tree (grow, prune) | Select from among these options to interactively or automatically grow the tree. To grow individual branches starting from a particular node, use the options under Node/branch on this tab (described below). |
Grow tree | Click this button to automatically grow the current tree, using all current settings and following the current Model building method as specified on the Interactive Trees Startup Panel - Quick tab. After growing the tree, use any of the Review tree options described below, or the Brush tree option to further review the tree. You can also prune the tree using the various methods and options available in this dialog box. |
Brush tree | Click the Brush tree button to display the current tree and the Brushing Commands dialog containing tree-brushing tools to interactively "brush" the current tree. In this mode, you can select all of the options for growing or pruning the tree, and immediately review the results of the chosen actions on the current tree. See also, Tree Brushing Tools for additional details. |
Grow tree & prune | This option is only available (applies only to) Classification and Regression Trees (C&RT); it is not available if CHAID or Exhaustive CHAID is selected as the Model building method on the Interactive Trees Startup Panel - Quick tab. Click this button to automatically grow the tree and apply pruning to select a "right-sized" tree given the criteria for pruning as specified on the ITrees Extended Options dialog - Stopping tab. In general, in classification and regression trees, after automatically growing the tree it usually needs to be pruned back to a smaller size to avoid overfitting and to derive a tree with good predictive validity (accuracy for predicting new observations). This issue is discussed in greater detail in the General Classification and Regression Trees (GC&RT) Overviews, in particular the Computational Details topic. |
Remove all branches | Click this button to remove all branches from the current tree; this option effectively enables you to "start over" with a new tree after removing all previously grown branches. |
Grow tree 1 level | Click this button to grow the tree one level down from each of the current terminal nodes. The new branches will be grown using the methods consistent with the Model building method as selected on the Interactive Trees Startup Panel - Quick tab. Note that branches will only be grown if this is consistent with the current parameter settings for the current method for growing the tree (e.g., stopping rules as specified on the ITrees Extended Options dialog - Stopping tab). |
Remove 1 level | Click this button to remove from the tree one level or set of branches connecting to the terminal nodes. |
Review tree | Use the options in this section to review the current tree. In order to review all details of large trees with many terminal nodes, use the Tree browser option, which will display a workbook where each node is represented by a single graph, and where the nodes can be navigated using tree-browser facilities familiar from Windows applications (and as implemented, for example, in Statistica workbooks). |
Tree browser | Click the Tree browser button to produce a complete representation of the results tree inside a Statistica workbook-like browser, where every node is represented by a graph containing the respective split rule (unless the respective node is a terminal node) and various summary statistics. Intermediate and terminal nodes will be shown in the browser with different symbols:
This browser provides a complete summary of the results and enables you to efficiently review even the most complex trees (see also, for example, the General Classification and Regression Trees (GC&RT) topic Reviewing Large Trees: Unique Analysis Management Tools). The results displayed will differ depending upon whether the selected response variable is categorical or continuous. |
Results for categorical dependent variable (classification) | If you are analyzing a categorical response variable (i.e., you selected Classification Analysis in the Type of analysis list on the Interactive Trees Startup Panel - Quick tab), then clicking on a node in the tree browser will produce a graph displaying the number of cases in each category of the variable as well as the histogram of statistics for the selected node. |
Results for continuous dependent variable (regression) | If you are analyzing a continuous response variable (i.e., you selected Regression Analysis in the Type of analysis list on the Interactive Trees Startup Panel - Quick tab), then clicking on a node in the tree browser will produce a graph displaying the mean and variance of the variable as well as the plot of normal density with these parameters for the selected node. |
Tree graph | Click the Tree graph button to produce the Tree graph for the current tree. In this graph, each node will be presented as a rectangular box; the terminal nodes are highlighted in red, and the intermediate nodes are highlighted in blue (by default). The following information is usually summarized in this graph: Node ID, the node size, the selected category of the response and the histogram (for classification-type problems) or the mean and variance at the node (for regression-type problems). The graph also contains splitting information for the intermediate nodes - the splitting criterion that created its child nodes and the name of the predictor that was used in the splitting criterion. Note that all labels and legends for the graph are produced as custom text and can be edited, moved, or deleted via the Graph Options dialog box.
These tools enable you to review all details of large and complex tree graphs. For reviewing large trees, you can also use the Scrollable tree or Tree browser facilities, which are particularly well suited for browsing the information contained in complex regression or classification trees. |
Tree layout | Click the Tree layout button to display the graph showing the structure of the current tree. Each node will be presented as a rectangular box; terminal nodes are highlighted in red and non-terminal nodes are highlighted in blue. |
Scrollable tree | Click this button to display the same Tree graph, but in a scrollable window.
This option will display a very large graph that can be reviewed (scrolled) "behind" a (resizable) window. Note that all standard graphics editing, zooming, etc., tools for customization and reviewing further details of the graph are still available for this method of display. |
Advanced scrollable tree | Click this button to create a scrollable tree with advanced features:
For classification: 1) Generates the graph tabulating category, their count, and %. Also, if the number of categories exceed 5, an ellipses (...) will be displayed at the end suggesting more categories. 2) The predicted category is marked with an asterisk ( * ) in the table. 3) A ToolTip will be displayed when the mouse pointer hovers over the node. 4) Right-click in the node to display a shortcut menu containing options to produce results or perform operations for the node selected. The options are available when the analysis is active and is not closed. Sensitivity options are available if Collect sensitivity data is specified in the specifications dialog box. For regression: 1) Generates the graph with Min, Max, Mean, and Std. Dev statistics for the node. 2) Right-click in the node to display a shortcut menu containing options to produce results or perform operations for the node selected. The options are available when the analysis is active and is not closed. Sensitivity options are available if Collect sensitivity data is specified in the specifications dialog box. 3) Surrogate information is added to the PMML output. |
Node/branch | Use the options in this group box to review results for individual nodes and to specify exactly how to grow branches from the selected node. |
Node ID | Select from this drop-down list the node (Node ID) for which to display the respective results chosen from the options described below. |
Customize splits | Click this button to display the Customized variable selection and split dialog box. Use the options in this dialog box to specify the exact split for each predictor. This gives you full control over exactly how to grow the branches of the tree. The exact appearance of the dialog will depend on the Model building method selected on the Interactive Trees Startup Panel - Quick tab. For C&RT (classification and regression trees) where only binary splits are allowed, you can specify exactly how to split the respective continuous and/or categorical predictors to achieve two branches. For CHAID trees where each node may grow multiple (more than two) branches, you can specify the number of splits and how to create them for each continuous and/or categorical variable. See the descriptions of the Customize Variable Selection and Split (C&RT Binary Trees) and Customize Variable Selection and Split (CHAID and Exhaustive CHAID) dialog boxes for details. |
Changing a specific split in an existing tree | To change a specific split in an existing tree, first locate the respective node and Node ID from which the branches (that you want to customize) originate. Enter that ID into the Node ID field, and then click the Customize splits button to display the respective Customized variable selection and split dialog (for C&RT Binary or CHAID and Exhaustive CHAID trees). In that dialog box, make the desired selections (specify the split for the respective predictor), and exit the dialog box via the Grow button. As a result, the selected split for the selected predictor will be used to grow branches from the selected node (Node ID). |
Grow branch | Click this button to grow a branch from the selected node, as specified in the Node ID field (and also grow sub-branches, if appropriate). Statistica will use the appropriate automatic methods to grow the branches, consistent with the Model building method selected on the Interactive Trees Startup Panel - Quick tab, and consistent with the specifications selected in the ITrees Extended Options dialog box; note that if no further splits are possible given those settings, then clicking this button will cause no changes to the tree. |
Grow branch 1 level | Click this button to grow a branch by one level from the selected node, as specified in the Node ID field; unlike the Grow branch option described above, this option will create only a single split, i.e., grow the tree from the selected node by only one level. Statistica will use the appropriate automatic methods to grow the branches, consistent with the Model building method selected on the Interactive Trees Startup Panel - Quick tab, and consistent with the specifications selected on the ITrees Extended Options dialog box; note that if no further splits are possible given those settings, then clicking this button will cause no changes to the tree. |
Remove branch | Click this button to remove all branches originating from the selected node (as specified in the Node ID field) from the tree. |
Predictor stats | Click this button to display a spreadsheet with the predictor statistics for the current node. At each node, a particular statistic can be computed to evaluate which predictor would yield the best split or the greatest improvement to the overall fit of the model. The specific statistics that will be reported depend on the Model building method (C&RT vs. CHAID or Exhaustive CHAID) as well as the Type of analysis (Classification Analysis vs. Regression Analysis) selected on the Interactive Trees Startup Panel - Quick tab.
|
Split criterion | Click this button to display a summary results spreadsheet with information regarding the exact split at the selected node (specified in the Node ID field); this spreadsheet will show the categories or split for the respective predictor variable used at this node. |
SQL Code | (Selection Rule) Click this button to generate the rules for classifying observations into the respective node (as selected in the Node ID field), in the form of SQL code. To generate SQL code for all terminal nodes, use the SQL code for all terminal nodes option on the ITrees Results - Report tab. |
Data | (Belonging to Node) Click the Data button to display the spreadsheet containing the data belonging to the node selected in the Node ID field. Simultaneously, a parallel coordinate plot for these data will be produced for the selected node, showing the pattern of values for the observations belonging to this node, across the predictor variables. This graph is extremely useful for identifying "typical patterns" of predictor values for particular terminal classifications or nodes. |
Histogram of DV | (Belonging to Node) Click this button to produce the histogram of responses for the data belonging to the selected node. |
Sensitivity | The Sensitivity graph is only available for C&RT models, and when the Collect sensitivity analysis data check box was selected on the Advanced tab of the ITrees Extended Options dialog box. The sensitivity graph illustrates how the goodness of split changes across different cut-off values for a specific continuous predictor variable for the selected node (Node ID). The blue line depicts the improvement or goodness of split as it changes across different split or cut-off values of a specified continuous predictor variable. The red vertical line indicates the cut-off value that produces the largest importance value. In addition, a box plot is displayed at the bottom of the graph that displays the distribution of the specified continuous predictor variable. |
Sensitivity by rank | The Sensitivity by rank graph is equivalent to the Sensitivity graph with the exception that the ranked cut-off values are displayed on the x-axis instead of the raw values. |
Select a surrogate | Surrogate splits are available only for classification and regression trees (when C&RT is selected as the Model building method on the
Interactive Trees Startup Panel - Quick tab) when the currently selected node (selected in the Node ID field) is not a terminal node, and when the Number of surrogates on the
ITrees Extended Options dialog box - Advanced tab is set to greater than 0.
By choosing "similar" predictors (surrogates) with valid data, cases (observations) with missing data can be classified so that such cases can be included in the analysis. Specifically, instead of using the best predictor (that would yield the greatest improvement in the fit of the tree model) for a split, if no valid data are available for that predictor for some cases, the program can use a surrogate predictor instead; note that this is not (necessarily) the next-best predictor splitting at the selected node, but the predictor that will generate the most similar assignment of observations to child nodes, i.e., the predictor that is most similar (to the primary predictor) with respect to the specific split (the degree of association of the surrogates to the original split variable can be reviewed via the Surrogate stats option). The maximum number of such surrogate variables allowed (e.g., when cases have missing data for the best predictor, the first surrogate, second surrogate, and so on) can be specified via the Number of surrogates option on the ITrees Extended Options dialog box - Advanced tab. By default, the program will automatically select the best surrogate variable at the selected node (as specified in the Node ID field). Select this option to choose a different primary surrogate instead. Use option Surrogate stats (see below) to review the statistics for each surrogate variable. |
Surrogate stats | Click this button to produce the degree of association between the surrogate and the respective primary split variable; the larger the value of this statistic for a surrogate variable, the more similar is the split of the observations into child nodes (at the current node) when you use the surrogate instead of the original split variable. If an association value is less than or equal to 0, it will be dropped from consideration as a surrogate variable. If all variables have an association value <= 0 for a specific node, then that node will not have a surrogate variable associated with it. The predictors in this spreadsheet will be sorted in descending order, from the best surrogate to the worst surrogate; the spreadsheet will show as many surrogate predictors as are specified via the Number of surrogates option on the ITrees Extended Options dialog box - Advanced tab. The spreadsheet will also display the split criterion for the surrogate variable. |
Pred. stats & details | The Pred. stats & details spreadsheet is only available for C&RT models, and when the Collect sensitivity analysis data check box is selected on the Advanced tab of the ITrees Extended Options dialog box. The Pred. stats & details spreadsheet displays the split information details for each predictor variable for the selected node (Node ID). |