Sequence, Association and Link Analysis Module Overview

The Sequence, Association and Link Analysis (SAL) module is an implementation of several state-of-the-art techniques specifically designed for extracting rules from data sets (databases) that can be generally characterized as "market baskets."

The "Market Basket" metaphor
The market basket problem assumes that there are a large number of products that can be purchased by the customer, either in a single transaction or over time in a sequence of transactions. Such products can be goods displayed in a supermarket, spanning a wide range of items from groceries to electrical appliances, or they can be insurance packages that customers might be willing to purchase. Customers fill their basket with only a fraction of what is on display or on offer. See technical notes for further details.
Association rules
Association rules can be extracted from a database of such transactions to determine which products are frequently purchased together. For example, you might find that purchases of flashlights also typically coincide with purchases of batteries in the same basket.  
Sequence analysis
Sequence analysis is concerned with a subsequent purchase of a product or products given a previous buy. For instance, buying an extended warranty is more likely to follow (in that specific sequential order) the purchase of a TV or other electric appliances. Sequence rules, however, are not always that obvious, and sequence analysis helps you to extract such rules no matter how hidden they may be in your market basket data. There is a wide range of applications for sequence analysis in many areas of industry including customer shopping patterns, phone call patterns, the fluctuation of the stock market, DNA sequence, and Web log streams.
Link analysis
Once extracted, rules about associations or the sequences of items as they occur in a transaction database can be extremely useful for numerous applications. Obviously, in retailing or marketing, knowledge of purchase "patterns" can help with the direct marketing of special offers to the "right" or "ready" customers (i.e., those who, according to the rules, are most likely to purchase specific items given their observed past consumption patterns). However, transaction databases occur in many areas of business, such as banking. In fact, the term "link analysis" is often used when these techniques - for extracting sequential or non-sequential association rules - are applied to organize complex "evidence." It is easy to see how the "transactions" or "shopping basket" metaphor can be applied to situations where individuals engage in certain actions, open accounts, contact other specific individuals, and so on. Applying the technologies described here to such databases may quickly extract patterns and associations between individuals and actions and, hence, for example, reveal the patterns and structure of some clandestine illegal network.

Functional Overview of Sequence, Association and Link Analysis

The SAL module is designed to address and carry out such tasks with the help of an intuitive user-friendly interface employing behind the scenes, state-of-the-art techniques and computationally efficient multi-threaded highly scalable algorithms capable of providing solutions within a short period of time. This tool, furthermore, has the unique capability of handling continuous variables as well as categorical variables or items, and it also enables the user to run both sequence and (non-sequence) association analyses on selected variables in a single analysis. These facilities are fully integrated into the STATISTICA platform, supporting a results interface specifically designed to provide the user with a wealth of further analyses tools. In fact, all the tools available in STATISTICA Data Miner can be quickly and effortlessly leveraged to analyze and "drill into" results generated via SAL.

Last but not least, SAL provides options for deployment, enabling you to quickly apply the rules extracted from historical data to make predictions (or "recommendations") about events (purchases) that are likely to happen next. Such models can conveniently be modified or deployed (e.g., in the STATISTICA Web-enabled client-server platform) at a later time with only a few clicks.

Highlights of Advanced and Unique Features of Sequence, Association and Link Analysis

  • The Novel Algorithm: Instead of the a-priori algorithm, the program uses a Tree-Building technique to extract Association and Sequence rules from data.
  • Database Technology: Uses efficient and thread-safe local relational Database technology to store Association and Sequence models.
  • Variable Handling: Can handle multiple response, multiple dichotomy and continuous variables in one analysis.
  • Multi-Tasking: Can perform Sequence analysis while also mining for Association rules in a single analysis.
  • Multidimensional Analysis: Simultaneously extracts Association and Sequence rules for more than one dimension.
  • Quantitative Attributes: Given the ability to perform multidimensional Association and Sequence mining and the capacity to extract only rule for specific items, the program can be used for Predictive Data Mining.
  • Clustering Analysis: The module can perform Hierarchical Single-Linkage Cluster analysis, which can detect the more likely cluster of items that can occur. This has extremely useful practical Real-World applications such as in Retailing.

Conclusion

Sequence, Association and Link Analysis addresses the needs of clients in retailing, banking, insurance, industries by implementing the fastest known highly scalable algorithm with the ability to drive Association and Sequence rules in one single analysis. Furthermore, the program represents a stand-alone module that can be used for model building and deployment alike.