Reference: |
By default, when joining cluster synonyms, a hierarchy of segments is constructed from all of the joined files, and the resulting hierarchy is navigated in top-to-bottom, left-to-right order.
Therefore, if a left outer join is specified from a host synonym to a cluster that has an inner join, the inner join will be performed last and may remove rows from the host file, counteracting the purpose of the left outer join. Using the SET FOCTRANSFORM = NESTED_CLUSTERS/ON command, you can force the joins in the target cluster to be performed prior to the join between the host and target synonyms. When you use this setting, SQL scripts are used to join the tables in the target cluster prior to implementing the join to the host file. The left outer join will be performed last and will retain all rows in the host synonym.
The syntax is:
SET FOCTRANSFORM = {NESTED_CLUSTERS/OFF|NESTED_CLUSTERS/ON}
where:
Maintains the left-to-right, top-to-bottom order of segment navigation. This is the default value.
Performs the joins in the target cluster synonym prior to joining the host synonym to the result.
(FOC906) JOIN TO NON-ROOT SEGMENT segname IS NOT ALLOWED FOR NESTED_CLUSTERS
This example uses SQL Server data sources generated from a file of citibike trips uploaded from https://www.citibikenyc.com/system-data, and from a file of zip codes for the stations used for the trips (you can download this file from https://techsupport.informationbuilders.com/public/station_zip.csv).
A cluster synonym named station_trip_cls joins the station zip data source to a data source containing partial trip data (with only a few rows). The following shows the inner join defined in the cluster synonym:
FILENAME=STATION_TRIP_CLS, $ SEGMENT=STATION_ZIP_OLEDB, CRFILE=CITIBIKE/STATION_ZIP_OLEDB, CRINCLUDE=ALL, $ SEGMENT=CITIBIKE_PARTIAL_OLEDB, SEGTYPE=KU, PARENT=STATION_ZIP_OLEDB, CRFILE=CITIBIKE/CITIBIKE_PARTIAL_OLEDB, CRINCLUDE=ALL, CRJOINTYPE=INNER, JOIN_WHERE=STATION_ID EQ START_STATION_ID;, $
The following request issues a left outer join from a larger version of the trip data file to the cluster:
SET FOCTRANSFORM = NESTED_CLUSTERS/&VALUE SET SHORTPATH = SQL JOIN LEFT_OUTER START_STATION_ID IN CITIBIKE_TRIPDATA TAG T1 TO ALL STATION_ID IN STATION_TRIP_CLS TAG T2 AS J1 TABLE FILE CITIBIKE_TRIPDATA " NESTED_CLUSTERS/&VALUE" " " SUM CNT.T1.START_STATION_ID AS T1,Station CNT.ZIP_CODE CNT.T2.START_STATION_ID AS T2,Station ON TABLE SET PAGE NOLEAD ON TABLE SET STYLE * GRID=OFF,$ ENDSTYLE END
Running the request with &VALUE set to OFF generates the following trace:
SELECT COUNT(T1."START_STATION_ID"), COUNT(T2."ZIP_CODE"), COUNT(T3."START_STATION_ID") FROM ( ( citibike_tripdata_mssqloledb T1 LEFT OUTER JOIN station_zip_oledb T2 ON T2."STATION_ID" = T1."START_STATION_ID" ) INNER JOIN citibike_partial_msoledb T3 ON (T3."START_STATION_ID" = T2."STATION_ID") );
The output is shown in the following image. The inner join was done last, reducing the number of stations in the host file to the same number as in the cluster.
Running the request with &VALUE set to ON generates the following trace. Two SQL scripts are generated, one for the host file and one for the join in the cluster. Then, the left outer join is performed against the result of the inner join:
SELECT T1."START_STATION_ID" AS "SK001_START_STATION_ID", COUNT(T1."START_STATION_ID") AS "VB001_CNT_START_STATION_ID" FROM citibike_tripdata_mssqloledb T1 GROUP BY T1."START_STATION_ID"; (FOC2546) SQL SCRIPT __CITIBIKE_TRIPDATA_OLEDB_CITIBIKE_TRIPDATA_OLEDB.SQL CREATED SUCCESSFULLY (BUT NOT EXECUTED) _EDATEMP/__citibike_tripdata_oledb_citibike_tripdata_oledb HELD AS SQL_SCRIPT SELECT T1."STATION_ID" AS "SK001_STATION_ID", COUNT(T1."ZIP_CODE") AS "VB001_CNT_ZIP_CODE", COUNT(T2."START_STATION_ID") AS "VB002_CNT_START_STATION_ID" FROM station_zip_oledb T1, citibike_partial_msoledb T2 WHERE (T2."START_STATION_ID" = T1."STATION_ID") GROUP BY T1."STATION_ID"; (FOC2546) SQL SCRIPT __CITIBIKE_TRIPDATA_OLEDB_STATION_PARTIAL_OLEDB_CLS.SQL CREATED SUCCESSFULLY (BUT NOT EXECUTED) _EDATEMP/__citibike_tripdata_oledb_station_partial_oledb_cls HELD AS SQL_SCRIPT
SELECT SUM(T1."VB001_CNT_START_STATION_ID"), SUM(T2."VB001_CNT_ZIP_CODE"), SUM(T2."VB002_CNT_START_STATION_ID") FROM ( ( /* vvv */ SELECT T1."START_STATION_ID" AS "SK001_START_STATION_ID", COUNT(T1."START_STATION_ID") AS "VB001_CNT_START_STATION_ID" FROM citibike_tripdata_mssqloledb T1 GROUP BY T1."START_STATION_ID" ) /* ^^^ */ T1 LEFT OUTER JOIN ( /* vvv */ SELECT T1."STATION_ID" AS "SK001_STATION_ID", COUNT(T1."ZIP_CODE") AS "VB001_CNT_ZIP_CODE", COUNT(T2."START_STATION_ID") AS "VB002_CNT_START_STATION_ID" FROM station_zip_oledb T1, citibike_partial_msoledb T2 WHERE (T2."START_STATION_ID" = T1."STATION_ID") GROUP BY T1."STATION_ID" ) /* ^^^ */ T2 ON T2."SK001_STATION_ID" = T1."SK001_START_STATION_ID" );
The output is shown in the following image. The left outer join was done last, maintaining the original number of stations in the host file.