Model Comparison Node

Overview

The node picks the best model, based on statistics, and produces a PMML.

The node then automatically checks in the Champion Challenger model to Statistica or Statistica Enterprise.

What you can compare:

  • Models from any nodes
  • New models vs older models
  • Models with new data
  • Model stored in Statistica or Enterprise
  • Compare multiple enterprise models
Note: The models must be of the same type. For example, you cannot compare a Regression Tree with a Classification Tree. If you try to do so, and error is returned.

What it does:

  • Compares different models
  • Generates a Summary of Quality matrix
  • Selects a champion model
  • Compares previous model and same model with new data
  • Compares recorded statistics in PMML. Then use Rapid Deployment to review the statistics from new data.  

You have to have the downstream PMML linked. Option must be checked.

PMML is generated when Comparison is run

Downstream is challenging (Champion) model–the model that is updated

Upstream can be any node

Input

  • Connect the Model Comparison node to connect to Models and Data Input. The input data used for comparison should specify Observed and Predicted variables, which can be specified using the Select Variables node.

For classification

Select observed variable in dependent categorical field and prediction variable in predictor categorical field  to generate model quality statistics, based on the observed and predicted column.

For regression

Select observed variable in dependent continuous and prediction variable in predictor continuous to generate model quality statistics based on observed and predicted column.

Enterprise

  • Specify an Enterprise model using the PMML Model node from Data Mining menu.
  • Compare multiple enterprise models

Comparing to Enterprise is the default. Get the  existing model from Enterprise, and see if it is different.  If it is, it updates.

Output

The output from running this node is an XML spreadsheet.

The output generated shows model stats that comes from a PMML node, which is based on input data.

You can choose for Enterprise nodes to be automatically updated by checking Link to Enterprise on the PMML  tab of the PMML Model node from the Data Mining menu, then clicking the Deploy to Enterprise button.

When these options are selected, a new downstream node is created  of the best selected model and updated to Enterprise.

What output is this? Different outputs for different scenarios.

Options:

Option Description
Specifications/Quick tab
Model Type Auto, if you select auto, the node tries to automatically detect the model from the connected nodes and compares the models using the statistics defined.
You can also select one of the following:
  • Classification
  • Regression
Regression comparison statistics Select the stat on which to compare regression model. The model with the lower error and high R square statistics is selected as the best model. These are the choices:
  • Mean error
  • Absolute mean error
  • Root Mean squared error
  • Sum of squared error
  • R squared
Classification comparison statistics Select the stat on which to compare classification models:
  • Misclassification error rate
  • Chi-square statistic
  • G-square statistic
Use test sample statistics only for selecting model You can select this check box to use test sample statistics only for selecting a model: When selected, only models having test statistics and models using data files are compared. Training statistics are ignored and would not be used for comparison. For instance, the model Advanced Classification Trees (C&RT) do not have a test Data usage tag, so the model is not used in comparison.    
Results/Quick Tab
Model statistics
Results/Enterprise Tab
Update selected model to Enterprise You can select this checkbox to automatically update the model to Enterprise. The downstream model node should be linked to Enterprise with the link option selected. The model is updated if the selected model is different from the model in Enterprise.

Selecting this option uses the data supplied by the connected Select Variables dialog box (Data tab, Variables icon/select variables). The input data used for comparison should specify Observed and Predicted variables.