Overview
The node picks the best model, based on statistics, and produces a PMML.
The node then automatically checks in the Champion Challenger model to Statistica or Statistica Enterprise.
What you can compare:
- Models from any nodes
- New models vs older models
- Models with new data
- Model stored in Statistica or Enterprise
- Compare multiple enterprise models
Note: The models must be of the same type. For example, you cannot compare a
Regression Tree with a
Classification Tree. If you try to do so, and error is returned.
What it does:
- Compares different models
- Generates a
Summary of Quality matrix
- Selects a champion model
- Compares previous model and same model with new data
- Compares recorded statistics in PMML. Then use Rapid Deployment to review the statistics from new data.
You have to have the downstream PMML linked. Option must be checked.
PMML is generated when Comparison is run
Downstream is challenging (Champion) model–the model that is updated
Upstream can be any node
Input
- Connect the Model Comparison node to connect to Models and Data Input. The input data used for comparison should specify Observed and Predicted variables, which can be specified using the
Select Variables node.
For classification
Select observed variable in dependent categorical field and prediction variable in predictor categorical field to generate model quality statistics, based on the observed and predicted column.
For regression
Select observed variable in dependent continuous and prediction variable in predictor continuous to generate model quality statistics based on observed and predicted column.
Enterprise
- Specify an Enterprise model using the PMML Model node from Data Mining menu.
- Compare multiple enterprise models
Comparing to Enterprise is the default. Get the existing model from Enterprise, and see if it is different. If it is, it updates.
Output
The output from running this node is an XML spreadsheet.
The output generated shows model stats that comes from a PMML node, which is based on input data.
You can choose for Enterprise nodes to be automatically updated by checking
Link to Enterprise on the
PMML tab of the
PMML Model node from the
Data Mining menu, then clicking the
Deploy to Enterprise button.
When these options are selected, a new downstream node is created of the best selected model and updated to Enterprise.
What output is this? Different outputs for different scenarios.
Options:
Option
|
Description
|
Specifications/Quick tab
|
Model Type
|
Auto, if you select auto, the node tries to automatically detect the model from the connected nodes and compares the models using the statistics defined.
You can also select one of the following:
- Classification
- Regression
|
Regression comparison statistics
|
Select the stat on which to compare regression model. The model with the lower error and high R square statistics is selected as the best model. These are the choices:
- Mean error
- Absolute mean error
- Root Mean squared error
- Sum of squared error
- R squared
|
Classification comparison statistics
|
Select the stat on which to compare classification models:
- Misclassification error rate
- Chi-square statistic
- G-square statistic
|
Use test sample statistics only for selecting model
|
You can select this check box to use test sample statistics only for selecting a model: When selected, only models having test statistics and models using data files are compared. Training statistics are ignored and would not be used for comparison. For instance, the model
Advanced Classification Trees (C&RT) do not have a
test Data usage tag, so the model is not used in comparison.
|
Results/Quick Tab
|
Model statistics
|
|
Results/Enterprise Tab
|
Update selected model to Enterprise
|
You can select this checkbox to automatically update the model to Enterprise. The downstream model node should be linked to Enterprise with the link option selected. The model is updated if the selected model is different from the model in Enterprise.
Selecting this option uses the data supplied by the connected
Select Variables dialog box (Data tab, Variables icon/select variables). The input data used for comparison should specify
Observed and
Predicted variables.
|
Copyright © 2021. Cloud Software Group, Inc. All Rights Reserved.