SANN - Subsampling Dialog Box and Quick Tab

Select the Subsampling option button on the Quick tab of the SANN - Data selection dialog box and click OK to display the SANN - Subsampling dialog box.

In many ways the Subsampling dialog box is similar to the Custom Neural Network (CNN) dialog box. As with CNN, users can specify options in the Subsampling dialog box to create individual neural networks with full specifications, such as size and architecture, as determined by the user. The difference here is, while all networks created via CNN in one analysis use the same train, test, and validation samples, the Subsampling dialog box enables us to create multiple neural networks on different samples.

Depending on the analysis type selected in the Startup Panel (and in some cases, the selected network type), this dialog box can contain up to five tabs simultaneously. The available tabs are: Quick (options described below), MLP, RBF, Weight Decay, Initialization, and Real Time Training Graph. Use the options on these tabs to configure the neural networks. Note that you can also access the Subsampling dialog box from the SANN Results dialog boxes.

Option Description
Active neural networks The grid in the Active neural networks group box provides a quick view of the networks you have created for modeling your data. If you have not trained any networks or if you have not selected any active networks, this grid remains empty.
Train Builds or trains networks according to the specifications made on the tabs of this dialog box. While the networks are being trained, the Neural networks training in progress dialog box is displayed. This dialog box provides summary details of networks as they are created. When the requested number of networks has been trained, the SANN - Results dialog box is displayed.
Go to results Displays the Results dialog box without performing additional training. Note that if the Active neural networks grid is empty, this button is not available.
Save networks Displays a drop-down list containing the following commands.
PMML Displays the Save PMML file dialog box, which contains options to store the active networks for future use. Note that this dialog box is displayed only when the Active neural networks grid contains networks. Stored PMML networks can be opened by clicking the Load network files button in the SANN - New Analysis/Deployment Startup Panel.
C/C++ language Displays the Save C file dialog box, which contains options to store the active networks for future use.
C# Generates code as C#.
Java Generates code in Java script.
SAS Displays the Save SAS file dialog box, which contains options to save deployment code for the created model as SAS code (a .sas file).
SQL stored procedure in C# Generates code as a C# class intended for use in a SQL Server user defined function.
SQL User Defined Function in C# Generates code as a C# class intended for use as a SQL Server user defined function.
Teradata Generates code as C Computer language function intended for use as a user-defined function in a TeraData querying environment.
Deployment to Statistica Enterprise Deploys the results as an Analysis Configuration in Statistica Enterprise. Note that appropriately formatted data must be available in a Statistica Enterprise Data Configuration before the results can be deployed to an Analysis Configuration.
Data statistics Generates a spreadsheet containing the mean, standard deviation, minimum value and maximum value for each continuous variable in the analysis. These data statistics is broken down by each sample (training, testing, and validation) and also reported for the overall data set. Since Subsampling assigns different train, test, and validation samples to each network, this option generates as many spreadsheets as the number of active network.
Summary Generates a spreadsheet containing the summary details listed in the Active neural networks grid. Note that if the Active neural networks grid is empty, this button is not available.
Cancel Exits the SANN - Subsampling dialog box and returns to the SANN - Data selection dialog box. Any selections made is disregarded, and you are prompted to discard any networks in the Active neural networks grid.
Options Displays the Options menu.
Quick Tab: Select the Quick tab of the SANN - Subsampling dialog box to access the following options.
Network type Specifies the type of network (multilayer perceptron or radial basis function).
Multilayer perceptron (MLP) Generates multilayer perceptron networks. The multilayer perceptron is the most common form of network. It requires iterative training and the networks are quite compact, execute quickly once trained, and in most problems yield better results than the other types of networks.
Radial basis function (RBF) Generates radial basis function networks. Radial basis function networks tend to be slower and larger than multilayer perceptron and often have inferior performance, but they can be trained faster than MLP for large data sets and linear output activation functions.
Error function Specifies the error function to be used in training a network.
Sum of squares Selects the Sum of squares option button to generate networks using the sum of squares error function. Note that this is the only error function available for regression type analyses.
Cross entropy Selects the Cross entropy check box to generate networks using cross entropy error functions. This error function assumes that the data is drawn from the exponential family of distributions and supports a direct probabilistic interpretation of the network outputs. Note this error function is only available for classification problems; it will be disabled for regression type analyses. When the Cross entropy error function is selected, the Output neurons (in the Activation functions group box) is always set to Softmax.
Activation functions Use the options in this group box to select activation functions for the hidden and output neurons. The choice of the activation function, which is the precise mathematical function, is crucial in building a neural network model since it is directly related to the performance of the model. Generally, it is recommended that you choose the Tanh and Identity functions for the hidden and output neurons for multilayer perceptron networks (default settings) when the Sum of squares error function is used. For Radial basis function networks, the Hidden units are automatically set to Gaussian; and the Output units are set to either Identity (when Sum of squares error function is used) or Softmax (when Cross entropy error function is used).
Hidden units This drop-down list is used to select the activation function for the hidden layer neurons. For Multilayer perceptron networks, these include the Identity function, hyperbolic Tanh (recommended), Logistic sigmoid, Exponential, and Sine activation functions. For Radial basis function networks, a Gaussian activation function is always used for hidden neurons.
  • Identity: Uses the identity function. With this function, the activation level is passed on directly as the output.
  • Tanh: Uses the hyperbolic tangent function (recommended). The hyperbolic tangent function (tanh) is a symmetric S-shaped (sigmoid) function, whose output lies in the range (-1, +1). Often performs better than the logistic sigmoid function because of its symmetry.
  • Logistic: Uses the logistic sigmoid function. This is an S-shaped (sigmoid) curve, with output in the range (0, 1).
  • Exponential: Uses the exponential activation function.
  • Sine: Uses the standard sine activation function.
  • Gaussian: Uses a Gaussian (or Normal) distribution. This is the only choice available for RBF neural networks.
Output units This drop-down list is used to select the activation functions for the hidden-output neurons. For Multilayer perceptron networks, these include the Identity function (recommended), hyperbolic Tanh, Logistic sigmoid, Exponential, Sine, and Softmax activation functions. For Radial basis function networks, the choice of Output units is dependent on the selected Error function. For RBF networks with Sum of squares error function, an Identity activation function is used. For RBF networks with Cross entropy error function, the Softmax activation function is always used.
  • Identity: Uses the identity function (recommended). With this function, the activation level is passed on directly as the output.
  • Tanh: Uses the hyperbolic tangent function. The hyperbolic tangent function (tanh) is a symmetric S-shaped (sigmoid) function, whose output lies in the range (-1, +1). Often performs better than the logistic sigmoid function because of its symmetry.
  • Logistic: Uses the logistic sigmoid function. This is an S-shaped (sigmoid) curve, with output in the range (0, 1).
  • Exponential: Uses the negative exponential activation function.
  • Sine: Uses the standard sine activation function.
  • Softmax: Uses a specialized activation function for one-of-N encoded classification networks. It performs a normalized exponential (which means the outputs add up to 1). In combination with the Cross entropy error function, it allows Multilayer perceptron networks to be modified for class probability estimation.
No. of neurons Specifies the number of neurons in the hidden layer of the network. The more neurons the hidden layer contains, the more complex or flexible it becomes.