Predicates and Performance
Filtering predicates are very useful as a complement to inexact matching. However, certain performance issues might arise with filtering predicates that can seem counterintuitive. Because of the interaction between the inexact and exact selection criteria, a query with a highly selective predicate can take a lot longer to process than one with no predicate or with a predicate that filters out only a small percentage of records.
Fortunately, predicate partition indexes provide a solution for the performance issues connected with very selective predicates. When used correctly, partition indexes can greatly improve query throughput in such cases. Predicate indexes work only with the GIP prefilter. See Prefilters and Scaling for a description of prefilters.
Partition Indexes
A partition index is a special data structure associated with a field of an in-memory table that speeds processing when using filtering predicates that make comparisons with the value of that field. Though a given field can have multiple partition indexes, a partition index is always associated with exactly one field, which can be of any field type. Partition indexes are defined for a table when the table is created, and can neither be added nor removed subsequently. Any number of indexes can be defined for a table.
A partition index is of one of two types: primary or secondary. A primary index provides greater improvements in speed, especially for large tables and highly selective filtering predicates, but at the cost of significantly higher memory usage to store the index. A secondary index provides much less improvement than a primary index, but at low memory cost.
Both types of indexes work by partitioning records into groups based on the value of the field. This allows the search to be constrained from the start to those groups (partitions) for which the predicate expression might evaluate to true, skipping those partitions that cannot evaluate to true. You define a partition index (of either type) by defining these groups, essentially by specifying an upper limit value for each group.
For example, to index a Date of Birth field you can define December 31st for each year from 1900 to 2100 as the partitions, separating all the dates of birth into separate years (dates in the future are set to avoid redefining the table each year). So the partition values would be: "12/31/1900","12/31/1901","12/31/1902", ... "12/31/2099", "12/31/2100". If your filtering predicate was:
DATE "6/7/1985" <= $"dob" AND $"dob" <= DATE "6/7/1987"
then your partition index would allow you to skip all records except those in the partitions for "12/31/1985", "12/31/1986" and "12/31/1987". Primary indexes provide the best advantage on very large tables where the filtering predicate selects only a small percentage of the total number of records. Generally, if a predicate expression filters out over 95 percent of all records, a primary index provides a large advantage over secondary or no index. If the predicate expression generally filters out less than 90 percent of records, a secondary index might be suitable. Creating an index might not be required if the predicate expression filters out only a few percent of records.
A few additional points related to partition indexes are:
| • | A larger number of partitions allows the index to be more selective and thus gain greater performance improvements. However, there is a limit of 1021 partitions. In general, it is best to use more partitions to improve performance. However, increasing the number of partitions increases memory usage, especially with a primary index. |
| • | Having the first primary index on a table involves a moderate amount of overhead. Each additional primary index on a table after the first incurs a very large memory overhead penalty. (Although a child table can have one or more primary indexes, the first primary partitioned index incurs a very large memory overhead penalty.) If you need multiple partition indexes for a table, the most selective or most used field should be made the primary index, the others should be secondary. A table should have only one primary index unless there is a very compelling reason to have additional primary indexes, and you are willing to pay a very high price in memory usage. A parent table can have at the most one primary index. |
| • | The IN, SUPERSET, SUBSET, and predicate function operators cannot use partitioned indexes to speed their evaluation. |
| • | Only predicate expressions comparing native (unconverted) values of the field can use the index. For example, if a birth date value is stored in the table as a text field rather than a date field, a predicate expression of the form: |
DATE $"birthday" > DATE "12/7/1986"
cannot use the index as the field value is converted to DATE before being compared. (If dates are to be used in predicate comparisons, store them as fields of the date field type.)
| • | An OR operator in which any sub-expression cannot use a predicate index (for example, uses the IN operator) also cannot use the partition index. |
| • | Certain complex expressions, especially those that combine the NOT operator with the OR operator, might not be able to use a partition index. If your application requires a complex filtering predicate, consult TIBCO Support to select the indexes and predicate expressions best suited to your needs. |
| • | A joined search cannot use a primary index on a child table. However a non-joined search can use a primary index on a child table. |