Association Rules - Technical Note on Coding of Multiple Response Variables

Computational Procedures and Terminology discusses the nature of multiple response variables; these types of variables are also described in some detail in the context of Basic Statistics (see Multiple Responses/Dichotomies). To avoid confusion when analyzing text values or codes, be careful to review the "coding" that is used when preparing the data for the analyses. As described in Notes on Text Labels and Text Values, text values in Statistica are always associated with numeric values or codes, and hence, all types of variables (including text variables) can be analyzed using numerical analysis techniques (e.g., you can compute means for text variables and values).

Because association rules are often applied to text mining tasks, the Statistica Association Rules module tabulates and compares the text representation of values across the columns (variables) in the data file that make up a multiple response variable. So for example, if you have two variables that make up a single multiple response variable, and both variables contain the code or text value Male, Statistica treats these values as identical in the two variables (columns) of the input spreadsheet, even if the numeric codes "underlying" the text value Male are different (e.g., Male could be associated with the numeric value 1 in the first variable, and with the numeric value 2 in the second variable).

Hence, when applying association rules to text mining tasks, you can simply import the text file "as-is" (using Statistica's flexible file import facilities), which usually results in the assignment of different numeric codes to the text values found in the imported text; still, the Association Rules module will properly identify and treat identical text values, regardless of the underlying numeric values. This, however, will not be the case in any other module of Statistica, which will regard the text values only as labels, and use the underlying numeric values as the primary source for the calculations.