K-Nearest Neighbors - Sampling Tab

Select the Sampling tab of the K-Nearest Neighbors dialog box to access options to divide the data into prototype and test samples. The testing sample is used to create a set of new query points for which the outcomes are estimated from the known values of the K nearest cases in the prototype sample. Detailed statistics of the predictions are accessible on the Quick tab as well as in the Summary box at the top of the Results dialog box.

Sampling method for selecting the example set: Following are descriptions of the options in this group box.
Use random sampling: Select this option button to divide the data into prototype and testing samples in a random manner. STATISTICA KNN uses a uniform random number generator to achieve this.
Size of prototype sample (%): In this field, specify the percentage of the valid cases [determined by case selection conditions (see case selection)] in the data set to be used to construct the prototype sample. Any remaining valid cases will be used to form the test sample.
Seed: In this field, specify the random number generator seed to be used in the process of (randomly) dividing the data into prototype and test samples.
Use sample variable: A sample variable is a variable in the data set that can be used to indicate case subsets. It must be a nominal variable with values taken from the set. Such variables can be generated from STATISTICA Spreadsheet capabilities.
Sample: With the Sample option, you can use a set of cases for forming your prototype sample. Click the Sample button to display the Sampling variable dialog box, where you can switch the Sample option on or off as well as select a variable that will be used as the sample identifier variable.
Use first N cases: Select this option button to select the first N valid cases [determined by the case selection conditions (see case selections)] of the data set as the prototype sample. Any remaining valid cases will be used as a testing sample. You should use this option only if you have good reasons to believe that the first N cases in the data set make a good prototype sample (i.e., a sample that is a good representative of the original data set).
First N: In this field, specify the number of first N cases to be used as the prototype sample.

Contents

Index

Search Results

K-Nearest Neighbors - Sampling Tab