K-Nearest Neighbors

General

Element Name	Description
Detail of computed results reported	Detail of computed results; if Minimal detail is requested, spreadsheets of analysis summary, model specifications as well descriptive statistics (regression or classification statistics) will be displayed; at the Comprehensive level of detail, a spreadsheet of predictions and residuals as well as their histogram plots will be displayed; in addition to the above, the All results level will display a spreadsheet (if the 'Creates residual statistics' option is selected) containing all data set variables and their statistics including predictions and residuals/accuracy (whichever applicable).
Missing data deletion

Element Name

Description

Detail of computed results reported

Detail of computed results; if Minimal detail is requested, spreadsheets of analysis summary, model specifications as well descriptive statistics (regression or classification statistics) will be displayed; at the Comprehensive level of detail, a spreadsheet of predictions and residuals as well as their histogram plots will be displayed; in addition to the above, the All results level will display a spreadsheet (if the 'Creates residual statistics' option is selected) containing all data set variables and their statistics including predictions and residuals/accuracy (whichever applicable).

Missing data deletion

Sampling

Element Name	Description
Sampling method	Sampling method to be used for dividing the data set into example and test subsets. Random sampling will divide the dataset into example and testing samples in a random fashion. This is in contrast to the First N method which selects the first N cases as the training set and the rest as the testing sample. NOTE: you may also use a learning/testing indicator variable method for sampling from the data. You can access this functionality via the Advanced tab of the data spreadsheet in the Data Acquisition of Statistica Data Miner environment. Selecting this method (i.e. learning/testing indicator) will override any choice of sampling you make on this tab.
Size of the example set (%)	Specifies the percentage of data cases that will be used as examples. The remaining valid cases in the dataset will be used to form the test sample.
Seed	Specifies the random generator seed for dividing data into the example and test sets
Use first N cases	Selects the first N valid cases in the data set as training subset. The rest are used for testing

Options

Element Name	Description
Number of nearest neighbors.
Distance measure	Specifies the metric to be used for measuring the distance between two points in the input space
Standardize distances	Select this option to standardize distances
Use weighted average\voting for predictions

Cross-validation

Element Name	Description
Apply v-fold cross-validation	Applies v-fold cross-validation to obtain estimates of the capacity, epsilon and nu parameters
V value	Number of cross-validation folds
Seed	Seed value for random data shuffling for cross-validation
Minimum K	Start value for number of nearest neighbors (used by cross-validation grid search)
Maximum K	End value for number of nearest neighbors (used by cross-validation grid search)
Increment in K

Memory usage

Element Name	Description
Restrict memory usage	Restrict the amount of memory that can be used by the analysis.
Amount of memory that can be used by the analysis	Amount of memory that can be used by the analysis.

Results

Element Name	Description
Subset used to generate results
Include inputs	Includes the independent variables in spreadsheets and histograms.
Include outputs	Includes the dependent variables in spreadsheets and histograms.
Include predictions	Includes predictions in spreadsheets and histograms.
Include residuals	Includes residuals in spreadsheets and histograms.
Include confidence levels (or standard deviations)	Includes classification confidence levels or standard deviations for regression in spreadsheets and histograms (meaningful only when number of nearest neighbors is larger than one).
Creates residual statistics	Creates predicted and residual statistics for each case depending on the selected level of details.

Deployment

Deployment is available if the Statistica installation is licensed for this feature.

Element Name	Description
Generates C/C++ code	Generates C/C++ code for deployment of predictive model.
Generates SVB code	Generates Statistica Visual Basic code for deployment of predictive model.
Generates PMML code	Generates PMML (Predictive Models Markup Language) code for deployment of predictive model. This code can be used via the Rapid Deployment options to efficiently compute predictions for (score) large data sets.
Saves C/C++ code	Save C/C++ code for deployment of predictive model.
File name for C/C code	Specify the name and location of the file where to save the (C/C++) deployment code information.
Saves SVB code	Save Statistica Visual Basic code for deployment of predictive model.
File name for SVB code	Specify the name and location of the file where to save the (SVB/VB) deployment code information.
Saves PMML code	Saves PMML (Predictive Models Markup Language) code for deployment of predictive model. This code can be used via the Rapid Deployment options to efficiently compute predictions for (score) large data sets.
File name for PMML (XML) code	Specify the name and location of the file where to save the (PMML/XML) deployment code information.

Contents

Index

Search Results