Input Formats in Correspondence Analysis - Frequencies with Grouping Variables
If the Frequencies with grouping variables option button is selected [from the Input group box on either the Correspondence Analysis (CA): Table Specifications Startup Panel - Correspondence Analysis (CA) tab or the Multiple Correspondence Analysis (MCA): Table Specifications Startup Panel - Multiple Correspondence Analysis (MCA) tab], Statistica expects grouping variables with code values uniquely identifying each category as input, and, in addition, expects a variable containing frequencies or some other values with the respective measure of correspondence for the categories indicated by the respective grouping variables. For example, the file may look like this:
STAFFGRP | SMOKING | FREQUENCY |
Sr.Manag | None | 4 |
Sr.Manag | Light | 2 |
Sr.Manag | Medium | 3 |
Sr.Manag | Heavy | 2 |
Jr.Manag | None | 4 |
Jr.Manag | Light | 3 |
Jr.Manag | Medium | 7 |
Jr.Manag | Heavy | 4 |
Sr.Empl | None | 25 |
Sr.Empl | Light | 10 |
Sr.Empl | Medium | 12 |
....... | ....... | ....... |
....... | ....... | ....... |
....... | ....... | ....... |
If you selected variables StaffGrp and Smoking for the analysis, and variable Frequency as the Variable with frequencies/counts, then Statistica would assign the respective value for variable Frequency to each cell in the table identified by the grouping variables.
- Selection of variables and codes
- The required selection of variables and codes is the same as that described under the option Raw data (requires tabulation), except that in addition, you will be prompted to select the Variable with frequencies/counts (that is, the variable containing the measure of correspondence, similarity, confusion, association). Note that only positive values or zero are allowed in that variable (example, Statistica does not permit negative frequencies).
- Multiple references to the same cell
- If there are multiple references to the same cell in the table, then the multiple values for the frequency variable are summed up, and the sum of the values assigned to the respective cell in the table. For example, consider the following data specifying a 2 by 2 table:
There are two references to the cell Male-High (that is, the first two cases in the listing above). Thus, the frequency assigned to that cell will be 4+6=10. This way of handling multiple references enables you to analyze subsets of tables that are coded in this manner. For example, suppose you had three grouping variables Gender, Income, and Occupation, and a fourth variable containing the frequencies for each cell in the three-way table. If you now only selected Gender and Income for the analysis, then Statistica would sum up all the frequencies in the two-way table defined by those two variables, and, in effect, compute the Gender by Income marginal frequency table, collapsing across the levels of the variable Occupation.