Discriminant Function Analysis Results - Classification Tab
Classification
Select the Classification tab of the Discriminant Function Analysis Results dialog box to access the options described here. Use these options to review various classification statistics. Note that if User defined a priori probabilities are selected in the A priori classification probabilities group box, then, whenever you click a button on this tab, the Specify A Priori Classification Probabilities dialog box will be displayed, in which you specify the a priori classification probabilities for each group.
- Classification functions
- Click the Classification functions button to produce a spreadsheet containing the classification functions. Classification functions are computed for each group and can be used directly to classify cases. We would classify a case into the group for which it has the highest classification score. We could then use these functions in the Long name field to define the transformations for new variables (one for each group). As we would then enter new cases, STATISTICA would automatically compute the classification scores for each group.
- Select
- Click the Select button to display the Analysis/Graph Case Selection Conditions dialog box. You can use case selection conditions to classify only cases that were not used to compute the current classification functions.
- Classification matrix
- Click the Classification matrix button to produce a spreadsheet containing the classification matrix. The classification matrix contains information about the number and percent of correctly classified cases in each group. Note that you can use the standard case selection conditions (click on the Select button ) to classify only cases that were not used to compute the current classification functions. Also, the computations for the classification of cases will be based on the a priori classification probabilities that are either 1) the same for all groups, 2) proportional to the respective group sizes, or 3) user defined (see A priori Classification Probabilities, below).
- Classification of cases
- Click the Classification of cases button to produce a spreadsheet with the classification for each selected case. The classifications are ordered into a first, second, and third choice. The column under the header 1 contains the first classification choice, that is, the group for which the respective case had the highest posterior probability (see option Posterior probabilities, below). The rows marked by the asterisk are cases that are misclassified.
- Squared Mahalanobis distances
- Click the
Squared Mahalanobis distances button to produce a spreadsheet with the squared Mahalanobis distances of each case from each group centroid. These distances are similar to the squared Euclidean distances of the respective case from the centroids for each group (the point defined by the means for all variables in the respective group). However, unlike the Euclidean distance, the Mahalanobis distance takes into account the intercorrelations between the variables in the model (which define the multivariate space).
A case will generally be classified into the group that it is closest to, unless widely disparate a priori probabilities lead to very different posterior classification probabilities. Asterisks in the first column of the spreadsheet will mark misclassified cases.
A particularly useful graph for this spreadsheet is the icon plot, which enables you to quickly compare the Mahalanobis distances of each case from the different groups.
- Posterior probabilities
- Click the
Posterior probabilities button to produce a spreadsheet containing the posterior probabilities. Given the Mahalanobis distances of a case from the different group centroids, we can compute the respective posterior classification probabilities for each group. In general, the further away a case is from a group centroid, the less likely it is that the case belongs to that group. The posterior probabilities are determined by the Mahalanobis distances and the
a priori classification probabilities (see below). A case will be classified into the group for which it has the highest posterior classification probability. Misclassified cases will be marked in the spreadsheet by asterisks.
A particularly useful graph for this spreadsheet is the icon plot, which enables you to quickly compare the posterior probabilities for each case, for each group.
- Save scores
- Click the Save scores button to display a standard variable selection dialog box, where you can choose which variables (if any) you want to display along with the scores specified in the Score to save for each case group box (see below). Note that the variable(s) and the scores will then be displayed in an individual window (regardless of the settings in the Options dialog box - Output Manager tab or the Analysis/Graph Output Manager dialog box). You can, however, add the spreadsheet to a workbook or report using the or buttons, respectively. Note that in order to save the spreadsheet, you must select the spreadsheet and select Save or Save As from the File menu. You can then use this datafile to perform additional analyses if you wish.
- A priori classification probabilities
- There are three choices available in the
A priori classification probabilities group box:
Proportional to group sizes,
Same for all groups, and
User defined. The
a priori probabilities specify how likely it is, without using any prior knowledge of the values for the variables in the model, that a case will fall into one of the groups. For example, in an educational study of high school drop-outs, it may happen that overall, there are fewer drop-outs than students who stay (i.e., there are different base rates); thus, the
a priori probability that a student drops out is lower than that a student remains in school. The
a priori probabilities can greatly affect the accuracy of the classification. If differential base rates are not of interest for the study, or if one knows that there are about an equal number of cases in each group, then one could set the
a priori probabilities to be the
Same for all groups. If the differential base rates are reflected in the sample sizes (as they would be, if the sample is a probability sample) then set the
a priori probabilities to be
Proportional to group sizes.
Finally, if you have specific knowledge about the base rates (for example, based on previous research), then set the a priori probabilities to User defined. In that case, after you subsequently click a button on this tab, the Specify A Priori Classification Probabilities dialog box will be displayed allowing you to specify the a priori probabilities for each group. If those probabilities do not add up to 1, STATISTICA will automatically adjust them proportionately.
- Score to save for each case
- Use the Score to save for each case group box to select which score should be displayed when you click the Save scores button (see above). You may choose to display either the actual classifications for each case (Save classification for case), the squared Mahalanobis distances (Save distance for case), or the posterior probabilities (Save posterior probability for case).
- Max
- number of cases in a single results spreadsheet. Enter the Max. number of cases in a single results spreadsheet value to control the maximum number of cases to be displayed in a single spreadsheet, when choosing any of the case-classification statistics (posterior probabilities, Mahalanobis distances, classifications) and canonical variate options (see the Canonical Analysis tabs). If there are more valid cases than the number specified in this field, then multiple spreadsheets will be displayed, with sequential "chunks" of the cases. The default value is 100,000.