Prefilters
A prefilter accelerates matching by quickly eliminating irrelevant records from consideration before performing the full, inexact matching calculation.
The TIBCO Patterns server provides multiple prefilter technologies: GIP, SORT, and PSI. As each prefilter has certain advantages, the best suited prefilter varies by application.
References to table sizes mentioned in the Prefilter Technologies table roughly correspond to the following sizes:
| • | Small table: Less than 10 million records |
| • | Mid-sized table: 10 to 20 million records |
| • | Large table: Greater than 20 million records |
| • | Very large table: Greater than 50 million records |
|
|
|||
|
Criteria |
GIP |
SORT |
PSI |
|
Use Case |
The default prefilter. Most applications use the GIP prefilter. Excels at applications with a wide variety of queries. |
Deduplication of mid-sized tables. |
Deduplication of very large tables. Applications with a single fixed query against a large table where performance is critical. |
|
Ease of Use |
Very easy to use. No setup needed, all searchable fields are indexed. No additional query information is needed. |
Default indexes might work, but manual definition of indexes might be needed. Search lookup fields might need to be specified along with the query. |
|
|
Query Time Performance |
Good for small-sized to mid-sized tables. Query times increase significantly on large tables (greater than 20 million records). |
SORT has the best query time performance. It scales well with table size. |
PSI query performance similar to GIP for small-sized to mid-sized tables, but scales better for large and very large tables. |
|
Record Load and Update Performance |
GIP has excellent performance for all record operations. |
Table load and record add and update performance is slower than GIP. |
Table load and record add and update performance is the same or slightly slower than SORT. |
|
Performance Tuning Options |
Predicate filters and GPU accelerators can be used. With a suitable graphics card, GPU accelerators can increase performance by up to 4.5 times. In certain applications, where most records can be eliminated from consideration with a simple predicate test, using a predicate filter (See Predicates and Performance) can increase performance more than 10 times. |
Change the set of indexes to improve the query time performance. Reducing the number of indexes improves performance, but might harm accuracy. |
|
|
Query Accuracy |
Excellent accuracy with a wide variety of queries and table sizes. For very large tables and queries with little information, special tuning of internal queue sizes might be needed. |
For accurate results SORT must have data from many different fields to work with. SORT does not work well on large tables.
|
PSI works best when there is data from many different fields. It is more accurate than SORT. It has similar accuracy to GIP for many common query situations, but might not work well for queries on a single field or a small number of fields. |
|
Memory Usage |
The sizing rule of thumb for GIP is five times the data size. |
SORT requires the least amount of memory. |
PSI can use more memory than GIP, depending on the set of indexes defined. |
|
Features Supported |
Supports the following features: joins, variable attributes, predicate indexes, GPU acceleration, and thesaurus matching. |
Does not support the following features: joins, variable attributes, predicate indexes, GPU acceleration, and thesaurus matching. |
|