User Guide > Performance Tuning > Semijoin Optimization Option > About the Semijoin Optimization
 
About the Semijoin Optimization
The semijoin optimization reduces the number of rows retrieved from the right-hand side (RHS) by rewriting the FETCH pushed to the second data source using the unique values returned by the left-hand side (LHS). Table cardinalities must be supplied for this optimization.
Inner joins can use the semijoin optimization. Left and right outer joins can use semijoin optimization, but only from the outer side to the inner side. The full outer join is not supported with the semijoin optimization.
You can specify use of the semijoin optimization in the SQL defining the view. When the table cardinalities of source and target sides are known, and when the ratio between the two is favorable, the TDV query engine automatically uses the semijoin optimization.
The semijoin can only be attempted if the right side can be queried as a node that fetches against a data source that supports an IN or an OR clause.
For inner and left outer joins, the semijoin optimization always uses rows from the LHS to constrain the RHS—for example, where LHS is the source side and RHS is the target. For a right outer join the situation is reversed: RHS is the source and LHS is the target.
As another example, consider the tables Employee and Dept and their semijoin.
Employee Name
EmpId
DeptName
Harry
3415
Finance
Sally
2241
Sales
George
3401
Finance
Harriet
2202
Production
 
Dept DeptName
Manager
Sales
Bob
Sales
Thomas
Production
Katie
Production
Mark
The semijoin of the employee and department tables would result in the following table.
Employee joined to Dept Name
EmpId
DeptName
Sally
2241
Sales
Harriet
2202
Production
Consider another query like this:
SELECT * FROM DS1.R INNER JOIN DS2.T ON R.r = T.t
 
The usual evaluation would hash the R table expression and then iterate over T table expressions, and do look-ups into the hash table to find matching rows. If the join in the example is such that the number of rows from the R-side is much smaller than the number of rows from the T-side, additional optimization is possible. The rows from R are buffered in TDV, but an additional predicate of the form “t IN (1,6,8,93…)” is added to the SQL generated for the RHS. The values on RHS of the IN clause are all the ‘r’ values from LHS of the join. This additional predicate ensures that TDV retrieves the smallest possible number of rows from the T-side to perform the join.