K-Nearest Neighbors with Deployment (Classification)
Full-featured implementation of K-Nearest Neighbors (KNN) for classification problems. The final solution is automatically stored for deployment.
General
Element Name | Description |
---|---|
Detail of computed results reported | Detail of computed results; if Minimal detail is requested, spreadsheets of analysis summary, model specifications as well descriptive statistics (regression or classification statistics) will be displayed; at the Comprehensive level of detail, a spreadsheet of predictions and accuracy as well as their histogram plots will be displayed; in addition to the above, the All results level will display a spreadsheet (if the 'Creates accuracy statistics' option is selected) containing all data set variables and their statistics including predictions and accuracy (whichever applicable). |
Missing data deletion. | |
Generate datasource, if N for input less than | Generate a data source for further analyses with other Data Miner nodes if the input data source has fewer than k observations, as specified in this edit field; note that parameter k (number of observations) will be evaluated against the number of observations in the input data source, not the number of valid or selected observations. |
Sampling
Element Name | Description |
---|---|
Sampling method | Sampling method to be used for dividing the data set into example and test subsets. Random sampling will divide the data set into example and testing samples in a random fashion. This is in contrast to the First N method which selects the first N cases as the training set and the rest as the testing sample. NOTE: you may also use a learning/testing indicator variable method for sampling from the data. You can access this functionality via the Advanced tab of the data spreadsheet in the Data Acquisition of Statistica Data Miner environment. Selecting this method (i.e. learning/testing indicator) will override any choice of sampling you make on this tab. |
Size of example set (%) | Specifies the percentage of data cases that will be used as examples. Any remaining valid cases in the data set will be used to form the test sample. |
Seed | Specifies the random generator seed for dividing data into the example and test sets. |
Sampling variable | Divides the data into example and test subsets. |
Sampling text | Text for dividing the data into example and test subsets. |
Use first N cases | Selects the first N valid cases in the data set as training subset. The rest are used for testing. |
Cross-validation
Element Name | Description |
---|---|
Apply v-fold cross-validation | Applies v-fold cross-validation to obtain estimates of the capacity, epsilon and nu parameters |
V value | Number of cross-validation folds. |
Seed | Seed value for random data shuffling for cross-validation |
Minimum K | Start value for number of nearest neighbors (used by cross-validation grid search). |
Maximum K | End value for number of nearest neighbors (used by cross-validation grid search). |
Increment in K |
Results
Element Name | Description |
---|---|
Subset used to generate results | |
Include inputs | Includes the independent variables in spreadsheets and histograms. |
Include outputs | Includes the dependent variables in spreadsheets and histograms. |
Include predictions | Includes predictions in spreadsheets and histograms. |
Include accuracy | Includes prediction accuracy in spreadsheets and histograms. |
Include confidence levels | Includes classification confidence levels in spreadsheets and histograms (meaningful only when number of nearest neighbors is larger than one). |
Creates residual statistics | Creates predicted and residual statistics for each case depending on the selected level of details. |
Deployment
Deployment is available if the Statistica installation is licensed for this feature.
Element Name | Description |
---|---|
Generates C/C++ code | Generates C/C++ code for deployment of predictive model. |
Generates SVB code | Generates Statistica Visual Basic code for deployment of predictive model. |
Generates PMML code | Generates PMML (Predictive Models Markup Language) code for deployment of predictive model. This code can be used via the Rapid Deployment options to efficiently compute predictions for (score) large data sets. |
Saves C/C++ code | Save C/C++ code for deployment of predictive model. |
File name for C/C code | Specify the name and location of the file where to save the (C/C++) deployment code information. |
Saves SVB code | Save Statistica Visual Basic code for deployment of predictive model. |
File name for SVB code | Specify the name and location of the file where to save the (SVB/VB) deployment code information. |
Saves PMML code | Saves PMML (Predictive Models Markup Language) code for deployment of predictive model.. This code can be used via the Rapid Deployment options to efficiently compute predictions for (score) large data sets. |
File name for PMML (XML) code | Specify the name and location of the file where to save the (PMML/XML) deployment code information. |
Copyright © 2021. Cloud Software Group, Inc. All Rights Reserved.