Association Rules - Tabular Representation of Associations

As described in Computational Procedures and Terminology, association rules are generated of the general form if Body then Head, where Body and Head stand for single codes or text values (items) or conjunctions of codes or text values (items; e.g., if (Car=Porsche and Age<20) then (Risk=High and Insurance=High). The major statistics computed for the association rules are Support (relative frequency of the Body or Head of the rule), Confidence (conditional probability of the Head given the Body of the rule), and Correlation (support for Body and Head, divided by the square root of the product of the support for the Body and the support for the Head, see Computational Procedures and Terminology). These statistics can be summarized in a spreadsheet, as shown below.

  1. This results spreadsheet shows an example of how association rules can be applied to text mining tasks. This analysis was performed on the paragraphs (dialog spoken by the characters in the play) in the first scene of Shakespeare's All's Well That Ends Well, after removing a few very frequent words like is, of, etc. The values for support, confidence, and correlation are expressed in percent. Note that the rules in the results spreadsheet shown were sorted by the Correlation column, using the standard Data - Sort facilities of Statistica.