Support Vector Machine with Deployment (Regression)

Full-featured implementation of Support Vector Machines (SVM) for regression problems. The final solution is automatically stored for deployment.

General

Element Name	Description
Detail of computed results reported	Detail of computed results; if Minimal detail is requested, spreadsheets of analysis summary, model specifications as well descriptive statistics (regression statistics) will be displayed; at the Comprehensive level of detail, a spreadsheet of predictions and residuals as well as their histogram plots will be displayed; in addition to the above, the All results level will display a spreadsheet (if the 'Creates residual statistics' option is selected) containing all data set variables and their statistics including predictions and residuals/accuracy (whichever applicable).
Missing data deletion	Specifies the substitution method for missing data. Casewise excludes cases that contain any missing data for any of the selected variables in the analysis. Mean substitution replaces missing data by the means for the respective variables (Note: This option is not applicable for categorical dependent and predictor variables).
Generate datasource, if N for input less than	Generate a data source for further analyses with other Data Miner nodes if the input data source has fewer than k observations, as specified in this edit field; note that parameter k (number of observations) will be evaluated against the number of observations in the input data source, not the number of valid or selected observations.

Sampling

Element Name	Description
Divide data into train and test samples	Divides the data set into training and test sample. The training subset is used to fit the model while the test subset serves as an independent check of its performance.
Sampling method	Sampling method to be used for dividing the data set into train and test subsets. Random sampling will divide the data set into train and testing samples in a random fashion. This is in contrast to the First N method which selects the first N cases as the training set and the rest as the testing sample. NOTE: you may also use a learning/testing indicator variable method for sampling from the data. You can access this functionality via the Advanced tab of the data spreadsheet in the Data Acquisition of Statistica Data Miner environment. Selecting this method (i.e. learning/testing indicator) will override any choice of sampling you make on this tab.
Size of training sample	Specifies the percentage of data cases that will be used to form the training sample. The remaining valid cases in the data set will be used as the test sample.
Seed	Specifies the random generator seed for random sampling of data into train and test subsets.
Use first N cases	Selects the first N valid cases in the data set as training subset. The rest are used for testing.

SVM

Element Name	Description
SVM type	Specifies the type of the SVM model.
Capacity	Capacity parameter.
Epsilon	Epsilon parameter.
Nu	Nu parameter.

Kernel

Element Name	Description
Kernel type	Specifies the type of the Kernel used by the SVM model.
Degree	Specifies the degree of the polynomial kernel.
Gamma	Specifies the gamma parameter for polynominal, RBF and sigmoid kernels.
Coefficient	Specifies the coefficient for polynominal and sigmoid kernels.

Cross-validation 1

Element Name	Description
Apply v-fold cross-validation	Applies v-fold cross-validation to obtain estimates of the capacity, epsilon and nu parameters.
V value	Number of cross-validation folds.
Seed	Seed value for random data shuffling for cross-validation.
Minimum C	Start value for the capacity parameter (used by cross-validation grid search).
Maximum C	End value for the capacity parameter (used by cross-validation grid search).
Increment in C
Minimum Epsilon	Start value for the epsilon parameter (used by cross-validation grid search).
Maximum Epsilon	End value for the epsilon parameter (used by cross-validation grid search).
Epsilon increment	Increment in epsilon.

Cross-validation 2

Element Name	Description
Minimum Nu	Start value for the nu parameter (used by cross-validation grid search).
Maximum Nu	End value for the epsilon parameter (used by cross-validation grid search).
Nu increment	Increment epsilon (used by cross-validation grid search).

Training

Element Name	Description
Max number of iterations	The maximum number of iterations that can be applied in training the SVM model.
Stop at accuracy	Training stops when the given level of accuracy is reached.
Cache size, in MB	Cache size in MB
Shrink data	Shrink data
Scale inputs	Check this option to linearly scale the inputs within the range 0 to 1.
Scale outputs	Check this option to linearly scale the outputs within the range -1 to 1.

Results

Element Name	Description
Subset used to generate results	Select the subset for which the results should be displayed.
Include inputs	Includes the independent variables in spreadsheets and histograms.
Include outputs	Includes the dependent variables in spreadsheets and histograms.
Include predictions	Includes predictions in spreadsheets and histograms.
Include residuals	Includes residuals in spreadsheets and histograms.
Creates residual statistics	Creates predicted and residual statistics for each case depending on the selected level of details.

Deployment

Deployment is available if the Statistica installation is licensed for this feature.

Element Name	Description
Generates C/C++ code	Generates C/C++ code for deployment of predictive model.
Generates SVB code	Generates Statistica Visual Basic code for deployment of predictive model.
Generates PMML code	Generates PMML (Predictive Models Markup Language) code for deployment of predictive model. This code can be used via the Rapid Deployment options to efficiently compute predictions for (score) large data sets.

Copyright © 2021. Cloud Software Group, Inc. All Rights Reserved.