Association Rules Overview
The goal of association rules techniques is to detect relationships or associations between specific values of categorical variables in large data sets. This is a common task in many data mining projects and in its subcategory, text mining. These powerful exploratory techniques have a wide range of applications in many areas of business practice and also research - from the analysis of consumer preferences or human resource management, to the history of language. The techniques make it possible for analysts and researchers to uncover hidden patterns in large data sets, such as "customers who order product A often also order product B or C" or "employees who said positive things about initiative X also frequently complain about issue Y but are happy with issue Z." The implementation of the so-called a priori algorithm (see Agrawal and Swami, 1993; Agrawal and Srikant, 1994; Han and Lakshmanan, 2001; see also Witten and Frank, 2000) in Statistica enable you to process huge data sets rapidly for such associations, based on predefined "threshold" values for detection.
To summarize, you can use the Association Rules module of Statistica to find rules of the kind If X then (likely) Y where X and Y can be single values, items, words, etc., or conjunctions of values, items, words, etc. (e.g., if (Car=Porsche and Gender=Male and Age<20) then (Risk=High and Insurance=High)). The program can be used to analyze simple categorical variables, dichotomous variables, and/or multiple response variables. The algorithm will determine association rules without requiring the user to specify the number of distinct categories present in the data, or any prior knowledge regarding the maximum factorial degree or complexity of the important associations. In a sense, the algorithm will construct crosstabulation tables without the need to specify the number of dimensions for the tables or the number of categories for each dimension. Hence, this technique is particularly well suited for data and text mining of huge databases.