Data Mining ... Quick Linear Models Project

  1. Select the Data Mining tab.
  2. In the Tools group, click Workspaces.
  3. From the General Classifier (Trees and Clusters) submenu or the General Modeler and Multivariate Explorer submenu, select Quick Linear Models Project to display a project workspace (template) with several prearranged nodes for the following tasks:
    • Fitting a linear model to a regression problem (with a continuous dependent variable)
    • Automatically generating deployment information (for predicting new observations)

This project can be further augmented to include any of the nodes available in the Regression Modeling and Multivariate Exploration folder of the Node Browser (for fitting models and automatically generating deployment information), or using other nodes available in Statistica Data Miner.

See Data Mining with Statistica, in particular the topic, Structure and User Interface of Statistica Data Miner for additional details; see also the Statistica Data Miner, Example 3: Predictive Data Mining and Deployment for a Continuous Output Variable).

To use the prearranged project as-is:

  1. Select a data source
  2. Specify the continuous dependent variable and the continuous and categorical predictor variables.
  3. Connect the input data to the node labeled Split Input Data into Training and Testing Samples.
  4. Run the project.

To predict new values based on the fitted linear model:

  1. Connect a data file with new observations to the project.
  2. Select the same variables as before (even if no valid values are available for the continuous dependent variable).
  3. Mark the input data as data for deployment by selecting the option Data for deployment project. Do not re-estimate models on the Select Dependent Variables and Predictors dialog.
  4. Connect the new input data source to the node labeled Compute Best Prediction From all Models.
  5. Run that node or update (Run) the entire project.

For a step-by-step example of this process, see Example 3: Predictive Data Mining and Deployment for a Continuous Output Variable.