Types of Joins
Inner and outer joins can use an equality condition (equijoins) or not (non-equi-joins). Most table joins are equi-joins, comparing columns and rows with like values. The nested loop algorithm is used for non-equi-joins.
The following table lists algorithms used and the types of joins that they can be used with.
Algorithm | Inner equi-join | Non-equi-join | Left/right outer join | Full outer join | Semijoin optimization |
Hash | yes | | yes | yes | yes |
Nested Loop | yes | yes | yes | yes | yes |
Sort Merge | yes | | yes | | |
TDV uses several algorithms to execute joins on table columns. By default, the TDV query engine attempts to find the best algorithm and join optimization to use for your SQL if none is specified. The Join Properties editor offers the following join algorithm options:
Automatic Option
By default, the TDV system automatically optimizes the SQL execution plans for any query, given estimates of left and right cardinality and other statistical information. Automatic is not a join algorithm or a join optimization, but it lets the TDV query engine optimize based on an analysis of the SQL using known database statistics.
You can specify the SQL execution plan explicitly with one of the other options; however, specification of a join option only influences what join algorithm or optimization is first attempted, not necessarily the one ultimately used. If the option specification is incompatible with the data sources or incorrectly specified, the execution plan might change to a compatible join that was not specified.
Hash Join Option
The HASH algorithm is useful when you are joining tables with large amounts of data. The optimizer uses the inner table to build a hash table on the join key in memory. It then scans the outer table, probing the hash table to find the joined rows.
The HASH algorithm requires only a single pass on each table. It is best used when the inner table fits into available memory.
However, if the hash table does not fit into available memory, the optimizer breaks it into partitions that are written to temporary segments on disk.
If the process is expected to write to disk, you can use the FORCE_DISK query hint (described in the TDV Reference Guide) to make the process write to disk from the beginning, saving read/write cycles and excessive memory usage.
NESTEDLOOP Join Option
You can use the NESTEDLOOP algorithm without any data structure information, but it is not inherently optimized for performance. NESTEDLOOP is useful when small subsets of data are joined and the join condition is an efficient way to access the right table.
When the outer (driving) table has numerous rows that force multiple probes of the inner table, the NESTEDLOOP algorithm becomes costly to run.
SORTMERGE Join Option
The SORTMERGE algorithm is a streaming operator. If both input streams can be ordered given the join criteria and compatibility, the join operator works efficiently and quickly in a small memory footprint.
SORTMERGE can also impose other requirements, such as both sides of the join being ordered. To meet that requirement, ordering must be compatible with the join criteria. For example, consider the following join criteria:
ON T1.A = T2.X AND T1.B = T2.Y
To perform the SORTMERGE join, the left side has to be put into ascending order by A and B, and the right side has to be put into ascending order by X and Y.
If a side is not ordered or does not have compatible ordering, the query engine automatically checks whether a compatible ORDER BY can be pushed to the data source so that the SORTMERGE join can proceed.
SORTMERGE is not compatible with the SQL when ORDER BY cannot be pushed to the data sources. Ordering must be performed on both sides for a SORTMERGE join.
To prevent the query engine from choosing SORTMERGE over HASH, specify the value of the SORTMERGE option as false, as follows:
{option SORTMERGE="false"}