Input Formats in Correspondence Analysis - Raw Data

If the Raw data (requires tabulation) option button is selected [from the Input group box on either the Correspondence Analysis (CA): Table Specifications Startup Panel - Correspondence Analysis (CA) tab or the Multiple Correspondence Analysis (MCA): Table Specifications Startup Panel - Multiple Correspondence Analysis (MCA) tab], Statistica expects as input (categorical) grouping variables with code values uniquely identifying to which category each case belongs. Statistica then tabulates the respective variables to compute the input table. For example, the variables might contain the following codes:

STAFFGRP	SMOKING
Sr.Manag	None
Sr.Manag	Light
Sr.Manag	Medium
Sr.Manag	Heavy
Jr.Manag	None
Jr.Manag	Light
Jr.Manag	Medium
Jr.Manag	Heavy
Sr.Empl	None
Sr.Empl	Light
Sr.Empl	Medium
.......	.......
.......	.......
.......	.......

If you selected variables StaffGrp and Smoking for the analysis, Statistica would cross-tabulate those variables and compute the two-way frequency table.

Selection of variables and codes for simple correspondence analysis

To specify a simple correspondence analysis, click the Row and column variable(s) button to display the standard variable selection dialog box. If you select one row variable and one column variable, then the analysis is performed on the two-way table defined by the categories for the two variables. Click the Codes for grouping variables button to display the Select Codes for Coding Variables dialog box, in which you enter the codes (numbers or text values) that define the categories for the selected variables. If more than one variable was selected for the list of row or column variables, then all combinations of the categories of the selected variables in one list (example, rows) are crosstabulated against the respective combinations of categories for the variables in the other list (example, columns). For example, in the following two-way table, the combinations of categories for the two column variables Age and Survival were tabulated against the combinations of categories for the two row variables Inflammation and Location.

Age:		under 50		50 to 69		over 69
Survival:		No	Yes	No	Yes	No	Yes
Inflamm	Location
MIN_MAL	TOKYO	9	26	9	20	2	1
MIN_MAL	BOSTON	6	11	8	18	9	15
MIN_MAL	GLAMORGN	16	16	14	27	3	12
MIN_BEGN	TOKYO	7	68	9	46	3	6
MIN_BEGN	BOSTON	7	24	20	58	18	26
MIN_BEGN	GLAMORGN	7	20	12	39	7	11
GRT_MAL	TOKYO	4	25	11	18	1	5
GRT_MAL	BOSTON	6	4	3	10	3	1
GRT_MAL	GLAMORGN	3	8	3	10	3	4
GRT_BEGN	TOKYO	3	9	2	5	0	1
GRT_BEGN	BOSTON	0	0	2	3	0	1
GRT_BEGN	GLAMORGN	0	1	0	4	0	1

In effect, the resulting table is a 4-way table, where the combinations of categories for the row and column variables are arranged to form a two-way table for the correspondence analysis.

Selection of variables and codes for multiple correspondence analysis

To specify a multiple correspondence analysis, click the Variables (Factors in Burt Table) button to display the standard variable selection dialog, in which you select variables for the analysis. The Burt table is computed for the categories of the selected variables. Select Codes for grouping variables to display the Select Codes for Coding Variables dialog box, in which you enter the codes (numbers or text values) that define the categories for the selected variables. For example, suppose you selected variables Survival (Yes, No), Age (<50, 50-69, and 69+), and Location (Tokyo, Boston, and Glamorgn) for the analysis. The program would compute the following type of Burt table for the multiple correspondence analysis.

	Survival		Age			Location
	NO	YES	<50	50-69	69+	TOKYO	BOSTON	GLAMORGN
SURVIVAL:NO	210	0	68	93	49	60	82	68
SURVIVAL:YES	0	554	212	258	84	230	171	153

AGE:UNDER_50	68	212	280	0	0	151	58	71
AGE:A_50TO69	93	258	0	351	0	120	122	109
AGE:OVER_69	49	84	0	0	133	19	73	41

LOCATION:TOKYO	60	230	151	120	19	290	0	0
LOCATION:BOSTON	82	171	58	122	73	0	253	0
LOCATION:GLAMORGN	68	153	71	109	41	0	0	221

The Burt table has a clearly defined structure. Overall, the data matrix is symmetrical. In the case of 3 categorical variables, the data matrix consists of 3 x 3 = 9 partitions, created by each variable being tabulated against itself, and against the categories of all other variables. Note that the sum of the diagonal elements in each diagonal partition (that is, where the respective variables are tabulated against themselves) is constant (equal to 764 in this case). Technically, the Burt table is the result of the inner product of an indicator or design matrix; to analyze tables based on indicator matrices that incorporate fuzzy coding schemes, you can specify as input a Burt table directly (select the Frequencies w/out grouping vars option button in the Input group box of the Multiple Correspondence Analysis (MCA): Table Specifications dialog box). Refer to MCA - Introductory Overview for additional details.

In addition to the variables defining the table for the analysis, you can designate some variables as Supplementary columns (variables). Note that unlike in simple correspondence analysis, where supplementary columns and rows can be added from the Correspondence Analysis Results - Supplementary points tab, in multiple correspondence analysis it is required that the supplementary columns also define a valid Burt table. Therefore, in this case click the Variables (Factors in Burt table) button to specify all variables for the analysis, and then click the Supplementary columns (variables) button to select the subset of those variables that are to be treated as supplementary columns. The variables selected as supplementary columns are not used for the computation of eigenvalues and eigenvectors (see Computational Details), but coordinate values are computed for those columns and reported in the spreadsheet and plots of coordinates.

Contents

Index

Search Results

Input Formats in Correspondence Analysis - Raw Data