Multiple Join Conditions
If the source side has many rows and produces a prohibitively long SQL string on the target side, the TDV query engine deals with it in one of these ways:
(t1,t2) IN ((1,2), (6,3), (8,4),(93,3)…)
The TDV Server might generate a predicate with two or more scalar IN clauses like this:
t1 IN (1,6,8,93") and t2 IN (2,3,4,3,…).
The scalar IN clauses syntax is more compact than the OR syntax, and it reduces the chance of partitioned semijoin and load on the target data source. However, this predicate is not as selective as the OR form, so it could bring more rows into TDV. Because of this trade-off, you can configure the number of rows on the source side that causes the engine to switch from the OR syntax to multiple scalar IN clause syntax.
Consider this more complicated example:
SELECT * from DS1.R LEFT OUTER JOIN DS2.T on R.r1 = T.t1 and R.r2 = T.t2
In this case, the TDV Server query engine might generate an additional predicate (the row-constructor form of the IN clause) on the T-side that takes the form:
(t1,t2) IN ((1,2), (6,3), (8,4), (93,3)…)
If the DS2 target supports this form of IN clause query phrasing, it is preferred.
If the data source does not support this form of IN clause, the additional predicate looks like this:
(t1=1 and t2=2) or (t1=6 and t2=3) or (t1=8 and t2=4) or …)
The second form is logically identical to the IN-clause form, but if the right side produces many rows, it could generate a long SQL string on the target side, potentially exceeding the capabilities of DS2.
| • | Pushing left outer joins with multiple predicates—If the query engine can predetermine what is selectable based on an ON clause and a WHERE clause, it pushes the query to the data source. For example: |
SELECT
O10_Id
O10_CHAR
S_Id
S_CHAR
FROM /users/eg/t1 left outer { option semijoin=”false”, hash } join /users/sqlsr/5kkON O10_Id = S_Id
WHERE O10_Id = 9
Here, the WHERE clause predicate is the same as the ON clause predicate, so O10_Id = 9 can be pushed to the data source, which can dramatically reduce the amount of data returned.