SANN - Data Selection

You can click the OK button in the SANN - Analysis/Deployment Startup Panel to display the SANN - Data Selection dialog box.

This contains four tabs depending on the type of analysis selected: Quick, Sampling, Subsampling, and Time series. The Time series tab is available for time series analysis. The options described here are available regardless of which tab is selected.

Option Description
OK Displays the dialog box for the strategy selected on the Quick tab (either the SANN - Automated Network Search (ANS) dialog box or the SANN - Custom Neural Network dialog box).
Note: If you have not already specified Variables, a standard variable selection dialog box is displayed first.
Cancel Closes the dialog box and returns to the SANN - New Analysis/Deployment Startup Panel.
Options See Options Menu for descriptions of the commands on this menu.
MD handling (inputs) This group box specifies the way to treat cases with missing values (in the input variables of the selected models). It is always disabled for Time series analysis. There are two options:
  • Casewise: Any cases with missing values are omitted when generating results. Cases with missing target values are labeled as Missing and used to form the Missing sample. The missing sample consists of data cases with one or more missing target values. This option is not available for Time series tasks. Casewise is the only method available for missing data handling of categorical variables.
  • Mean substitution: The mean substitution procedure is used to patch missing values before training or executing the network. When this option is selected, missing values are replaced with the training sample mean. Note that this option is applicable only to continuous variables. This implies that the mean substitution option is disabled when there are no continuous inputs in the analysis, and that for classification tasks the option cannot be applied to the target variable, in which case all cases with missing targets are labeled as missing, which means a case with missing target value. Such cases are grouped in SANN as the missing sample and can be used for fixing the basis functions of the RBF neural networks and for making predictions. Also note that the mean substitution is not applicable to time series analysis (whether regression or classification).
Case selection Displays the Analysis/Graph Case Selection Conditions dialog box, which is used to create conditions for which cases are included or excluded in the current analysis. More information is available in the case selection conditions overview, syntax summary, and dialog box description.
Case weights Displays the Analysis/Graph Case Weights dialog box, which is used to adjust the contribution of individual cases to the outcome of the current analysis by weighting those cases in proportion to the values of a selected variable. In Statistica SANN, case weights are used to encourage a network to emphasis on or ignore learning specific cases or even regions from the data set. All data cases by default have case weights equal to 1. If a data case is assigned a case weight less than 1, for example 0.5, then the error due to mis-fitting that data case is half. This means the network will emphasis less on learning this particular data case since there is less penalty for error in predictions. Similarly, a neural network will fine tune better to predicting a data case with weight, say, equal to 2, since in this case the error due to predictions is twice as much.
Note: The mean substitution option always computes the simple arithmetic mean, to replace missing data, even when weights are in effect. Weights in SANN are used or interpreted as measures of case importance. It means that they will affect the estimation of neural network parameters themselves. If the intention of weights is to compute a weighted mean, (example, a population average computed using weights) to replace missing data in the input file, use option Data - Data Filtering/Recoding - Replace Missing Data replace missing data values with weighted means.
Note: Weights in SANN are used and interpreted as measures of case importance, which means they will affect the estimation of neural network parameters themselves, but not more. For example, case weights are not used in mean substitution of missing data or calculations of data statistics such as mean and standard deviation of the variables. If you assign weights to cases in the data set, the neural network algorithm will try to predict cases with higher weights with more accuracy. This is useful in a number of situations such as imbalanced data or data sets with cases that are more important to accurately predict. Data cases with zero weights are excluded from the train, test, and validation samples (which means, they are ignored from the analysis). Cases weights can be integers or fractional numbers.