Star Schema Semijoin

Here the ON clauses of the joins are such that both D1 and D2 are connected to F, and both joins are inner joins. In this case two semijoins are possible—one from D1 to F and the other from D2 to F—because they both target the same data source.

If the data source is sufficiently robust, the data source can be marked in TDV Server so that the query engine is aware that the data source supports star schema semijoin optimization for multiple join nodes. In Studio, the Info > Advanced tab of the data source configuration pane has a Supports Star Schema property.

When the Supports Star Schema check box is enabled, the TDV query engine is made aware that both joins can be run using the semijoin optimization.

The reason this needs to be explicitly enabled is that even one semijoin with a large source side can place a significant burden on the target data source, and if several joins target the same data source at the same time the burden can overwhelm some data sources. It is easier for star schema semijoin to generate SQL strings that exceed the capabilities of the target data source. Thus this setting is useful when the target data source is very powerful or when all source sides are fairly small.

If the SQL sent to the target side in a star schema query exceeds the maximum length of SQL for the data source, the engine runs with a partitioned semijoin if possible, or it disables semijoin optimization for some of the joins.

The partitioned semijoin might not always apply to a star schema semijoin. If there are n joins targeting F and all but one of them generate a short IN clause but one generates a long IN clause, the query engine can still partition the long side of the semijoin. If all the joins produce IN clauses of approximately equal size partitioning might not be useful. In this case, the TDV query engine disables the semijoin optimization for the join that generates the longest SQL string. It continues to disable semijoin optimization until the total SQL for all remaining joins is within the capabilities of the target data source or partitioning becomes an option.

Because of the star schema semijoin, at most one of the predicates can be partitioned. If the query specifies more than one join with the partition_size option, at most only one of these requests can be satisfied.

For information about settings, see Setting Semijoin Configuration Parameters.

You can override settings for specific data sources.

Data-source overrides apply when a given data source is the target of a semijoin. Typically these values are set conservatively at the server level, and overridden if necessary for specific data sources.

The target data source has the burden of processing the potentially large predicates generated by a semijoin, and the target data source must impose limits on how big the query predicate from the source side can be.