Input Formats in Correspondence Analysis - Raw Data
If the Raw data (requires tabulation) option button is selected [from the Input group box on either the Correspondence Analysis (CA): Table Specifications Startup Panel - Correspondence Analysis (CA) tab or the Multiple Correspondence Analysis (MCA): Table Specifications Startup Panel - Multiple Correspondence Analysis (MCA) tab], Statistica expects as input (categorical) grouping variables with code values uniquely identifying to which category each case belongs. Statistica then tabulates the respective variables to compute the input table. For example, the variables might contain the following codes:
STAFFGRP | SMOKING |
Sr.Manag | None |
Sr.Manag | Light |
Sr.Manag | Medium |
Sr.Manag | Heavy |
Jr.Manag | None |
Jr.Manag | Light |
Jr.Manag | Medium |
Jr.Manag | Heavy |
Sr.Empl | None |
Sr.Empl | Light |
Sr.Empl | Medium |
....... | ....... |
....... | ....... |
....... | ....... |
If you selected variables StaffGrp and Smoking for the analysis, Statistica would cross-tabulate those variables and compute the two-way frequency table.
- Selection of variables and codes for simple correspondence analysis
- To specify a
simple correspondence analysis, click the
Row and column variable(s) button to display the standard variable selection dialog box. If you select one row variable and one column variable, then the analysis is performed on the two-way table defined by the categories for the two variables. Click the
Codes for grouping variables button to display the
Select Codes for Coding Variables dialog box, in which you enter the codes (numbers or text values) that define the categories for the selected variables. If more than one variable was selected for the list of row or column variables, then all combinations of the categories of the selected variables in one list (example, rows) are crosstabulated against the respective combinations of categories for the variables in the other list (example, columns). For example, in the following two-way table, the combinations of categories for the two column variables
Age and
Survival were tabulated against the combinations of categories for the two row variables
Inflammation and
Location.
Age: under 50 50 to 69 over 69 Survival: No Yes No Yes No Yes Inflamm Location MIN_MAL TOKYO 9 26 9 20 2 1 MIN_MAL BOSTON 6 11 8 18 9 15 MIN_MAL GLAMORGN 16 16 14 27 3 12 MIN_BEGN TOKYO 7 68 9 46 3 6 MIN_BEGN BOSTON 7 24 20 58 18 26 MIN_BEGN GLAMORGN 7 20 12 39 7 11 GRT_MAL TOKYO 4 25 11 18 1 5 GRT_MAL BOSTON 6 4 3 10 3 1 GRT_MAL GLAMORGN 3 8 3 10 3 4 GRT_BEGN TOKYO 3 9 2 5 0 1 GRT_BEGN BOSTON 0 0 2 3 0 1 GRT_BEGN GLAMORGN 0 1 0 4 0 1 In effect, the resulting table is a 4-way table, where the combinations of categories for the row and column variables are arranged to form a two-way table for the correspondence analysis.
- Selection of variables and codes for multiple correspondence analysis
- To specify a
multiple correspondence analysis, click the
Variables (Factors in Burt Table) button to display the standard variable selection dialog, in which you select variables for the analysis. The Burt table is computed for the categories of the selected variables. Select
Codes for grouping variables to display the
Select Codes for Coding Variables dialog box, in which you enter the codes (numbers or text values) that define the categories for the selected variables. For example, suppose you selected variables Survival (Yes, No), Age (<50, 50-69, and 69+), and Location (Tokyo, Boston, and Glamorgn) for the analysis. The program would compute the following type of
Burt table for the
multiple correspondence analysis.
Survival Age Location NO YES <50 50-69 69+ TOKYO BOSTON GLAMORGN SURVIVAL:NO 210 0 68 93 49 60 82 68 SURVIVAL:YES 0 554 212 258 84 230 171 153 AGE:UNDER_50 68 212 280 0 0 151 58 71 AGE:A_50TO69 93 258 0 351 0 120 122 109 AGE:OVER_69 49 84 0 0 133 19 73 41 LOCATION:TOKYO 60 230 151 120 19 290 0 0 LOCATION:BOSTON 82 171 58 122 73 0 253 0 LOCATION:GLAMORGN 68 153 71 109 41 0 0 221 The Burt table has a clearly defined structure. Overall, the data matrix is symmetrical. In the case of 3 categorical variables, the data matrix consists of 3 x 3 = 9 partitions, created by each variable being tabulated against itself, and against the categories of all other variables. Note that the sum of the diagonal elements in each diagonal partition (that is, where the respective variables are tabulated against themselves) is constant (equal to 764 in this case). Technically, the Burt table is the result of the inner product of an indicator or design matrix; to analyze tables based on indicator matrices that incorporate fuzzy coding schemes, you can specify as input a Burt table directly (select the Frequencies w/out grouping vars option button in the Input group box of the Multiple Correspondence Analysis (MCA): Table Specifications dialog box). Refer to MCA - Introductory Overview for additional details.
In addition to the variables defining the table for the analysis, you can designate some variables as Supplementary columns (variables). Note that unlike in simple correspondence analysis, where supplementary columns and rows can be added from the Correspondence Analysis Results - Supplementary points tab, in multiple correspondence analysis it is required that the supplementary columns also define a valid Burt table. Therefore, in this case click the Variables (Factors in Burt table) button to specify all variables for the analysis, and then click the Supplementary columns (variables) button to select the subset of those variables that are to be treated as supplementary columns. The variables selected as supplementary columns are not used for the computation of eigenvalues and eigenvectors (see Computational Details), but coordinate values are computed for those columns and reported in the spreadsheet and plots of coordinates.