The Importance of “Patterns” to Enterprises

In today’s globally networked world, enterprise data comes from many sources and is “used” by so many different people, organizations, and systems. Data is never 100% perfect, never 100% consistent, never 100% complete - and it never can be. Human beings are good at spotting relevant patterns embedded in the information and thus making sense out of imperfect data, despite inconsistencies, errors, differences in representation, and so forth.

The last several decades have seen the development of infrastructure to capture, transport, and store enterprise data in ever-growing repositories. Traditionally, SQL queries have been used to retrieve and analyze this data retrospectively. SQL queries can be efficient and function adequately when the data is perfect and you know precisely what data you need. In the real world, where data and queries are never consistent or correct, what happens when exact matching using SQL queries cannot find the data you need?

The underlying problem is that of gauging the similarity of different representations of what is the same entity. What is required is an accurate way of gauging that similarity to deal with imperfect data effectively – as human beings do.

Several techniques – some of them very old – have been developed to deal with imperfect data. All suffer from significant limitations and weaknesses. Some methods, like the century-old Soundex algorithm, are applicable and reliable only for certain kinds of data, like names of people. Some, like wildcard or substring matching, require many queries to be performed to get a result. Some, like string edit distance, are too computationally expensive to deal with variations outside of a tiny edit window. But all of these methods suffer from the fundamental defect of an inadequate concept of similarity – it is easy to construct plausible variations and misspellings for which these mainstream approaches perform either poorly, or not at all. The ability to deal with imperfect data is long overdue for an overhaul.

TIBCO Patterns provides applications with a much richer, more “human” measure of similarity that enables them to take the appropriate course of action. A high level of performance of true inexact pattern matching is needed to effectively deal with the following scenarios:

When looking for groups of possible duplicate medical records in a Master Patient Index.
While detecting fraudulent financial transactions or abuse of entitlement claims in government programs.
To provide a consumer with a likely list of products in response to an imperfectly formed query.