Data Mining - Data Mining - Workspaces - Data Miner - General Modeler and Multivariate Explorer - Advanced Comprehensive Regression Models
- Select the Data Mining tab.
- In the Tools group, click Workspaces.
- From the General Modeler and Multivariate Explorer submenu, select Advanced Comprehensive Regression Models to display a project workspace (template) with several pre-arranged nodes for fitting linear, nonlinear, regression-tree, CHAID and Exhaustive CHAID, and different Neural Networks architectures to a continuous dependent variable (regression problem), and for automatically generating deployment information (for predicting new observations).
Several options are available for the node Compute best Prediction from All Models to compute predicted values based on the average predictions from all or only the best-fitting models (of those included in the project during training).
This project can be further augmented to include any of the nodes available in the Regression Modeling and Multivariate Exploration folder of the Node Browser (for fitting models and automatically generating deployment information), or using other nodes available in Statistica Data Miner.
See Data Mining with Statistica, in particular the topic Structure and User Interface of Statistica Data Miner for additional details.
See also the Statistica Data Miner Example 3: Predictive Data Mining and Deployment for a Continuous Output Variable.
To use the prearranged project as-is:
- Select a data source.
- Specify the continuous dependent variable and the continuous and categorical predictor variables.
- Connect the input data to the node labeled Split Input Data into Training and Testing Samples
- Run the project.
To predict new values based on the fitted models:
- Connect a data file with new observations to the project
- Select the same variables as before (even if no valid values are available for the continuous dependent variable).
- Mark the input data as data for deployment (select option Data for deployment project and do not re-estimate models in the Select Dependent Variables and Predictors dialog box).
- Connect the new input data source to the node labeled Compute Best Prediction From all Models.
- Run that node or update (Run) the entire project.
For a step-by-step example of this process, see Example 3: Predictive Data Mining and Deployment for a Continuous Output Variable.
Various options are available in the Edit Parameters dialog box for the Compute Best Prediction From all Models node, to determine how exactly to combine the predictions from different models.