Input Formats in Correspondence Analysis - Raw Data

If the Raw data (requires tabulation) option button is selected [from the Input group box on either the Correspondence Analysis (CA): Table Specifications Startup Panel - Correspondence Analysis (CA) tab or the Multiple Correspondence Analysis (MCA): Table Specifications Startup Panel - Multiple Correspondence Analysis (MCA) tab], Statistica expects as input (categorical) grouping variables with code values uniquely identifying to which category each case belongs. Statistica then tabulates the respective variables to compute the input table. For example, the variables might contain the following codes:

STAFFGRP SMOKING
Sr.Manag None
Sr.Manag Light
Sr.Manag Medium
Sr.Manag Heavy
Jr.Manag None
Jr.Manag Light
Jr.Manag Medium
Jr.Manag Heavy
Sr.Empl None
Sr.Empl Light
Sr.Empl Medium
....... .......
....... .......
....... .......

If you selected variables StaffGrp and Smoking for the analysis, Statistica would cross-tabulate those variables and compute the two-way frequency table.

Selection of variables and codes for simple correspondence analysis
To specify a simple correspondence analysis, click the Row and column variable(s) button to display the standard variable selection dialog box. If you select one row variable and one column variable, then the analysis is performed on the two-way table defined by the categories for the two variables. Click the Codes for grouping variables button to display the Select Codes for Coding Variables dialog box, in which you enter the codes (numbers or text values) that define the categories for the selected variables. If more than one variable was selected for the list of row or column variables, then all combinations of the categories of the selected variables in one list (example, rows) are crosstabulated against the respective combinations of categories for the variables in the other list (example, columns). For example, in the following two-way table, the combinations of categories for the two column variables Age and Survival were tabulated against the combinations of categories for the two row variables Inflammation and Location.
  Age: under 50 50 to 69 over 69
Survival: No Yes No Yes No Yes
Inflamm Location  
MIN_MAL TOKYO 9 26 9 20 2 1
MIN_MAL BOSTON 6 11 8 18 9 15
MIN_MAL GLAMORGN 16 16 14 27 3 12
MIN_BEGN TOKYO 7 68 9 46 3 6
MIN_BEGN BOSTON 7 24 20 58 18 26
MIN_BEGN GLAMORGN 7 20 12 39 7 11
GRT_MAL TOKYO 4 25 11 18 1 5
GRT_MAL BOSTON 6 4 3 10 3 1
GRT_MAL GLAMORGN 3 8 3 10 3 4
GRT_BEGN TOKYO 3 9 2 5 0 1
GRT_BEGN BOSTON 0 0 2 3 0 1
GRT_BEGN GLAMORGN 0 1 0 4 0 1

In effect, the resulting table is a 4-way table, where the combinations of categories for the row and column variables are arranged to form a two-way table for the correspondence analysis.

Selection of variables and codes for multiple correspondence analysis
To specify a multiple correspondence analysis, click the Variables (Factors in Burt Table) button to display the standard variable selection dialog, in which you select variables for the analysis. The Burt table is computed for the categories of the selected variables. Select Codes for grouping variables to display the Select Codes for Coding Variables dialog box, in which you enter the codes (numbers or text values) that define the categories for the selected variables. For example, suppose you selected variables Survival (Yes, No), Age (<50, 50-69, and 69+), and Location (Tokyo, Boston, and Glamorgn) for the analysis. The program would compute the following type of Burt table for the multiple correspondence analysis.
  Survival   Age   Location
NO YES <50 50-69 69+ TOKYO BOSTON GLAMORGN
SURVIVAL:NO 210 0 68 93 49 60 82 68
SURVIVAL:YES 0 554 212 258 84 230 171 153
         
AGE:UNDER_50 68 212   280 0 0   151 58 71
AGE:A_50TO69 93 258 0 351 0 120 122 109
AGE:OVER_69 49 84 0 0 133 19 73 41
         
LOCATION:TOKYO 60 230   151 120 19   290 0 0
LOCATION:BOSTON 82 171 58 122 73 0 253 0
LOCATION:GLAMORGN 68 153 71 109 41 0 0 221

The Burt table has a clearly defined structure. Overall, the data matrix is symmetrical. In the case of 3 categorical variables, the data matrix consists of 3 x 3 = 9 partitions, created by each variable being tabulated against itself, and against the categories of all other variables. Note that the sum of the diagonal elements in each diagonal partition (that is, where the respective variables are tabulated against themselves) is constant (equal to 764 in this case). Technically, the Burt table is the result of the inner product of an indicator or design matrix; to analyze tables based on indicator matrices that incorporate fuzzy coding schemes, you can specify as input a Burt table directly (select the Frequencies w/out grouping vars option button in the Input group box of the Multiple Correspondence Analysis (MCA): Table Specifications dialog box). Refer to MCA - Introductory Overview for additional details.

In addition to the variables defining the table for the analysis, you can designate some variables as Supplementary columns (variables). Note that unlike in simple correspondence analysis, where supplementary columns and rows can be added from the Correspondence Analysis Results - Supplementary points tab, in multiple correspondence analysis it is required that the supplementary columns also define a valid Burt table. Therefore, in this case click the Variables (Factors in Burt table) button to specify all variables for the analysis, and then click the Supplementary columns (variables) button to select the subset of those variables that are to be treated as supplementary columns. The variables selected as supplementary columns are not used for the computation of eigenvalues and eigenvectors (see Computational Details), but coordinate values are computed for those columns and reported in the spreadsheet and plots of coordinates.