Support Vector Machines - Sampling Tab
Select the Sampling tab of the Support Vector Machines dialog box to access options to divide the data into training and test samples using one of the sampling methods provided. The training sample is used to construct the SVM model while the test sample is used as an independent data set to check the performance of the model after training has ended. Both training and test errors are accessible on the Support Vector Machine Results dialog box - Quick tab.
- Divide data into train and test samples
- Select this check box to divide the data set into training and test samples. Observations belonging to the training cases will be used to construct (train) the model while observations belonging to test cases will be used to evaluate the performance of the model, i.e., how well the model would perform when presented with unseen data.
- Use random sampling
- Select this option button to generate the training and test samples randomly. STATISTICA SVM uses a uniform random number generator to achieve this.
- Size of training sample (%)
- In this field, specify the percentage of the valid cases (determined by case selection conditions) in the data set that will be used to form the training set. Any remaining valid cases will be used to form the test sample.
- Seed
- In this field, specify the random number generator seed to be used in the process of (randomly) dividing the data into train and test samples.
- Use sample variable
- A sample variable is a variable in the data set that can be used to indicate case subsets. It must be a nominal variable with values taken from the set. Such variables can be generated from STATISTICA Spreadsheet capabilities.
- Sample
- With the Sample option, you can use a sub-sample of cases for forming your training sample. Click the Sample button to display the Sampling variable dialog box, where you can switch the Sample option on or off as well as select a variable that will be used as the sample identifier variable.
- Use first N cases
- Select this option button to select the first N valid cases [determined by the case selection conditions (see case selections)] of the data set as the prototype sample. Any remaining valid cases will be used as a testing sample. Use this option only if you have good reasons to believe that the first N cases in the data set make a good training sample (i.e., a sample that is a good representative of the original data set).
- First N
- The number of first N cases to be used as the prototype sample.
Copyright © 2021. Cloud Software Group, Inc. All Rights Reserved.