Multiple Responses/Dichotomies - Multiple Response Variables

As part of a larger market survey, suppose you asked a sample of consumers to name their three favorite soft drinks. The specific item on the questionnaire may look like this:

Write down your three favorite soft drinks:

1:________ 2:________ 3:________

Thus, the questionnaires returned to you contain somewhere between 0 and 3 answers to this item. Also, a wide variety of soft drinks are most likely named. Your goal is to summarize the responses to this item; that is, to produce a table that summarizes the percent of respondents who mentioned a respective soft drink.

The next question is how to enter the responses into a data file. Suppose 50 different soft drinks were mentioned among all of the questionnaires. You could of course set up 50 variables - one for each soft drink - and then enter a 1 for the respective respondent and variable (soft drink), if he or she mentioned the respective soft drink (and a 0 if not); for example:
COKE PEPSI SPRITE ....
case 1 0 1 0  
case 2 1 1 0  
case 3 0 0 1  
... ... ... ...  

This method of coding the responses would be very tedious and wasteful. Note that each respondent can only give a maximum of three responses; yet we use 50 variables to code those responses. (However, if we are only interested in these three soft drinks, this method of coding just those three variables would be satisfactory; to tabulate soft drink preferences, we could then treat the three variables as a multiple dichotomy.)

Coding multiple response variables

Alternatively, we could set up three variables, and a coding scheme for the 50 soft drinks. Then we could enter the respective codes (or alpha labels) into the three variables, in the same way that respondents wrote them down in the questionnaire.
Resp. 1 Resp. 2 Resp. 3
case 1 COKE PEPSI JOLT
case 2 SPRITE SNAPPLE DR PEPPER
case 3 PERRIER GATORADE MOUNTAIN DEW
... ... ... ...
To produce a table of the number of respondents by soft drink we would now treat Resp.1 to Resp. 3 as a multiple response variable. That table could look like this:
N=500

Category

Count Prcnt. of

Responses

Prcnt. of

Cases

COKE: Coca Cola 44 5.23 8.80
PEPSI: Pepsi Cola 43 5.11 8.60
MOUNTAIN: Mountain Dew 81 9.62 16.20
PEPPER: Doctor Pepper 74 8.79 14.80
...: ... .. ... ...
  842 100.00 168.40

Interpreting the multiple response frequency table

The total number of respondents was n = 500. Note that the counts in the first column of the table do not add up to 500, but rather to 842. That is the total number of responses; since each respondent could make up to 3 responses (write down three names of soft drinks), the total number of responses is naturally greater than the number of respondents. For example, referring back to the sample listing of the data file shown above, the first case (Coke, Pepsi, Jolt) "contributes" three times to the frequency table, once to the category Coke, once to the category Pepsi, and once to the category Jolt. The second and third columns in the table above report the percentages relative to the number of responses (second column) as well as respondents (third column). Thus, the entry 8.80 in the first row and last column in the table above means that 8.8% of all respondents mentioned Coke either as their first, second, or third soft drink preference.

Repeated identical responses

Unlike some other popular programs for computing tables for multiple response variables, the Multiple Response Tables option in the Basic Statistics and Tables module by default will ignore multiple identical responses. For example, suppose a respondent lists as his three preferences Jolt, Jolt, Jolt. Statistica will count that case only once, and consequently that person will only contribute once to the Jolt category in the frequency table.