Large Variable List for Input
This dialog box is displayed whenever you specify in the
Select Dependent Variables and Predictors dialog box many thousands of variables for the data source. For most analyses, it is not efficient to process many thousands of variables; however, in general, Statistica imposes no limits on the number of variables (columns) in data files, and for some data mining applications, tens or even hundreds of thousands of predictors are not uncommon. With
Statistica Data Miner, you choose how you want to handle such large analysis problems to optimize the performance of the program.
Option | Description |
---|---|
Use Feature Selection and Variable Screening to extract likely best predictors from among all predictors | Applies automatic screening and feature selection to the input data. After you exit this dialog box using the OK button, a Feature Selection and Variable Screening node is automatically inserted into the workspace. The operations performed by that node depend on the selection of variables in the Select Dependent Variables and Predictors dialog box. In all cases, the node creates a mapping of variables for subsequent analyses. |
Data sources for predictive data mining | If you specify dependent (output) and independent (input) variables for the data source in the Select Dependent Variables and Predictors dialog box. That is, if you are building a model for predictive data mining - the program automatically applies the very powerful methods of the Feature Selection and Variable Screening module of Statistica. This module can efficiently screen hundreds of thousands or even more than a million predictors to find a subset of continuous and categorical predictors that is likely related to the dependent variables of interest. The screening mechanism (by default) does not bias the selection of predictors in favor of any type of analysis or model that might subsequently be applied to finalize a predictive model. |
Data sources for exploratory data analysis (EDA) | When no predictor variables are specified in the Select Dependent Variables and Predictors dialog box, but only dependent variables for exploratory and descriptive data analysis, the program automatically chunks the list of (thousands of) variables for the analysis into multiple smaller input data sources. This allows for more efficient processing of different lists of variables. |
Select smaller subsets manually | Selects a smaller subset of variables from the large list of variables selected for the analyses. This sub-setting creates a mapping of the chosen variables into the original data source, and does not create a new data file. Hence, this is a very efficient method to select a smaller list of variables from a very large list. |
Use a custom method for feature extraction/filtering, to select a subset of interest | Inserts a feature selection or variable screening template node into the workspace. You can then fill in the Statistica Visual Basic code to perform the desired screening operation on the large variable lists chosen for the input data source. |
OK | Accepts the selection and closes this dialog box. |
Cancel | Closes this dialog box. Any selections made is disregarded. |
Copyright © 2021. Cloud Software Group, Inc. All Rights Reserved.