Prefilters

A prefilter accelerates matching by quickly eliminating irrelevant records from consideration before performing the full, inexact matching calculation.

The TIBCO Patterns server provides multiple prefilter technologies: GIP, SORT, and PSI. As each prefilter has certain advantages, the best suited prefilter varies by application.

References to table sizes mentioned in the Prefilter Technologies table roughly correspond to the following sizes:

Small table: Less than 10 million records
Mid-sized table: 10 to 20 million records
Large table: Greater than 20 million records
Very large table: Greater than 50 million records
Note: These sizes are approximations and vary widely depending on the content of the records and the nature of the application.

Prefilter Technologies

 

 

Criteria

GIP

SORT

PSI

Use Case

The default prefilter. Most applications use the GIP prefilter. Excels at applications with a wide variety of queries.

Deduplication of mid-sized tables.

Deduplication of very large tables. Applications with a single fixed query against a large table where performance is critical.

Ease of Use

Very easy to use. No setup needed, all searchable fields are indexed.

No additional query information is needed.

Default indexes might work, but manual definition of indexes might be needed.

Search lookup fields might need to be specified along with the query.

Query Time Performance

Good for small-sized to mid-sized tables. Query times increase significantly on large tables (greater than 20 million records).

SORT has the best query time performance. It scales well with table size.

PSI query performance similar to GIP for small-sized to mid-sized tables, but scales better for large and very large tables.

Record Load and Update Performance

GIP has excellent performance for all record operations.

Table load and record add and update performance is slower than GIP.

Table load and record add and update performance is the same or slightly slower than SORT.

Performance Tuning Options

Predicate filters and GPU accelerators can be used.

With a suitable graphics card, GPU accelerators can increase performance by up to 4.5 times.

In certain applications, where most records can be eliminated from consideration with a simple predicate test, using a predicate filter (See Predicates and Performance) can increase performance more than 10 times.

Change the set of indexes to improve the query time performance. Reducing the number of indexes improves performance, but might harm accuracy.
There is a tradeoff between accuracy and speed for SORT and PSI.

Query Accuracy

Excellent accuracy with a wide variety of queries and table sizes. For very large tables and queries with little information, special tuning of internal queue sizes might be needed.

For accurate results SORT must have data from many different fields to work with. SORT does not work well on large tables.

 

PSI works best when there is data from many different fields. It is more accurate than SORT. It has similar accuracy to GIP for many common query situations, but might not work well for queries on a single field or a small number of fields.

Memory Usage

The sizing rule of thumb for GIP is five times the data size.

SORT requires the least amount of memory.

PSI can use more memory than GIP, depending on the set of indexes defined.

Features Supported

Supports the following features: joins, variable attributes, predicate indexes, GPU acceleration, and thesaurus matching.

Does not support the following features: joins, variable attributes, predicate indexes, GPU acceleration, and thesaurus matching.