Performance Considerations

In addition to the standard factors of table size and query complexity, you need to consider a number of other factors for query performance in a joined search.

Fanout and Query Performance

Note that a joined query is typically a query across the fully denormalized representation of the parent and child tables involved in the query. Query performance is most closely tied to the size of the (virtual) fully de-normalized table.

If there is a single child table, the typical size of the fully denormalized table is the size of the child table (if there are no orphans).
When there is more than one child table it becomes more difficult to determine the size of the fully denormalized table. The size depends on the number of possible child record combinations for each parent record. If parent records tend to have many child records in a number of child tables, the denormalized table might be too large to be searchable.

In practice, the size of the denormalized table is associated with the typical or average fanout of the parent table records. The fanout is the number of child record combinations for the same parent record, where each child record is taken from a different parent table.

In the example mentioned earlier, the fanout for each parent record is:

Fanout

Person ID

Addresses # of children

Phones # of children

Fanout

1

3

2

6 (3x2)

2

1

0

1

3

1

1

1 (1x1)

Average

5 / 3 = 1.67

3 / 3 = 1

8 / 3 = 2.67

If you know the average fanout, the query performance is proportional to the average fanout multiplied by the size of the parent table.
If you do not know the average fanout, a crude estimate can be given by multiplying the average number of child records for the same parent record in the individual child tables. The average number of child records that a parent record has in a child table is calculated by dividing the number of child records in that child table by the number of parent records.

In the example, the average fanout estimated in this way is 1.67 * 1 = 1.67. This is a lower bound. It assumes that the fanout for all parent records is identical. In the example, the fanouts for individual records is highly uneven. Therefore, the actual fanout is much larger. This must be considered while estimating the size of the denormalized table.

All of these methods give only a rough estimate of query performance. A query that finds a parent record with an unusually large fanout can be much slower than a query that does not find such record. The worst case performance of a query can be estimated by executing a query that matches and returns the parent record that has the maximum fanout.

Warning: Adding child tables to a query greatly impacts query performance, especially if these tables are large compared to the parent table. You should be very careful of high fanout factors when there is more than one child table.

 

Single Parent Search

The following performance considerations are applicable when using a single parent search.

Consider a single-parent mode query with an output size of 20 records. This is asking for the best combination of child records for each of the 20 returned parent records. Many combinations are processed in all phases of query processing till the best combination is determined. This can require analyzing many additional combinations in all earlier phases of query processing. Thus, a large fanout factor can greatly increase the amount of time and memory required to process the query.

Note: Be careful of single parent searches on multiple child tables with high fanout which can greatly impact query performance.

Example: Consider a query with four child tables. Consider a parent record that has 1,000 child records from each of these four child tables. That means there are 1,000 x 1,000 x 1,000 x 1,000 = 1 Trillion child record combinations for this parent record. To analyze all these combinations, the TIBCO Patterns server would consume all of the available memory. (Usually it filters most of these combinations out before actually generating them, but this is not always the case.)

When dealing with records with extremely large fanouts, the TIBCO Patterns server sets an upper bound on the number of combinations of child records. The limits are hard limits that cause the query to fail.

All combinations of child records for each parent record are generated, and two hard limits are applied. The first limit is on the fanout of a single parent record (which is set at 1,048,576 by default). If any parent record encountered in a search exceeds this maximum fan out limit, the entire query fails.

This is called the single-fanout-limit. It applies to both single parent mode and multi-parent mode searches.

The second limit is on the total number of combinations returned from an early query processing stage (which is set at 100,000 by default). The query fails if the total number of combinations exceeds this limit. This is called the total-fanout-limit. It applies only to single-parent mode searches.

The single-fanout-limit is higher than the total-fanout-limit because the single-fanout-limit applies at a much earlier phase in the query processing. The cost of processing a record in the earlier phase is much lower. Most records are filtered out before the total-fanout-limit is applied.

The following values are set by using the –J Command-line option:

The single-fanout-limit, using the single-parent-rec parameter of the -J command line option.
The total-record-limit, using the all-parent-recs parameter of the -J command line option. This limit must always be greater than the GIP output size limit. The default GIP output size limit is the greater of 2,000 or 37 multiplied by the number of matches requested.

For more information, see TIBCO Patterns Installation

.
Note: You can also override the default values for a particular query. Consult a TIBCO representative before modifying these values.

 

Some additional considerations:

If a joined set of tables has one or more records with an extraordinarily large fanout:

Queries might have poor performance.
Queries might fail due to exceeding the single-fanout-limit. If this happens, examine the TIBCO Patterns console log file. A warning message giving the parent record key of the offending record is added in to the log whenever the single record limit is exceeded.

If a joined set of tables has a very large average fanout factor:

Queries might have poor performance.
Single-parent mode queries might fail due to exceeding the total-fanout-limit. To get around this, you can increase the total-fanout-limit. However, increasing the total-fanout-limit can affect the performance.

Matching Compound Records

When matching compound input records, performance is impacted if the input record has a large number of child records. The standard approach to matching compound records adds a separate querylet for each input child record. When the number of input child records is very large, this can lead to a query with hundreds or thousands of querylets. Such queries can become very expensive to process. In extreme cases, the TIBCO Patterns server might become unresponsive or it might run out of memory and terminate.

To prevent such issues, a limit is enforced on the total number of leaf querylets in a query (1,000 by default). Only score generator querylets count towards this limit. The limit can be set only at server startup, by using the max-querylets parameter of the -J command line option. For more information, see the TIBCO® Patterns Installation guide.