Navigating Joins Between Cluster Synonyms

Reference:

By default, when joining cluster synonyms, a hierarchy of segments is constructed from all of the joined files, and the resulting hierarchy is navigated in top-to-bottom, left-to-right order.

Therefore, if a left outer join is specified from a host synonym to a cluster that has an inner join, the inner join will be performed last and may remove rows from the host file, counteracting the purpose of the left outer join. Using the SET FOCTRANSFORM = NESTED_CLUSTERS/ON command, you can force the joins in the target cluster to be performed prior to the join between the host and target synonyms. When you use this setting, SQL scripts are used to join the tables in the target cluster prior to implementing the join to the host file. The left outer join will be performed last and will retain all rows in the host synonym.

The syntax is:

SET FOCTRANSFORM = {NESTED_CLUSTERS/OFF|NESTED_CLUSTERS/ON}

where:

NESTED_CLUSTERS/OFF

Maintains the left-to-right, top-to-bottom order of segment navigation. This is the default value.

NESTED_CLUSTERS/ON

Performs the joins in the target cluster synonym prior to joining the host synonym to the result.

Reference: Usage Notes for Joins to Cluster Synonyms

  • Using the SET FOCTRANSFORM = NESTED_CLUSTERS feature requires that the joins be optimized. The command SET SHORTPATH = SQL must be in effect for combinations of inner and outer joins with the setting FOCTRANSFORM = NESTED_CLUSTERS/OFF, in order for the request to be optimized. The SHORTPATH = SQL setting has no effect on optimization with the setting FOCTRANSFORM = NESTED_CLUSTERS/ON.
  • You cannot join to a non-root segment of a cluster synonym. If you issue a join to a non-root segment, the following message displays and the request terminates:
    (FOC906) JOIN TO NON-ROOT SEGMENT segname IS NOT ALLOWED FOR NESTED_CLUSTERS

Example: Navigating Joins Between Cluster Synonyms

This example uses SQL Server data sources generated from a file of citibike trips uploaded from https://www.citibikenyc.com/system-data, and from a file of zip codes for the stations used for the trips (you can download this file from https://techsupport.informationbuilders.com/public/station_zip.csv).

A cluster synonym named station_trip_cls joins the station zip data source to a data source containing partial trip data (with only a few rows). The following shows the inner join defined in the cluster synonym:

FILENAME=STATION_TRIP_CLS, $
  SEGMENT=STATION_ZIP_OLEDB, CRFILE=CITIBIKE/STATION_ZIP_OLEDB, CRINCLUDE=ALL, $
  SEGMENT=CITIBIKE_PARTIAL_OLEDB, SEGTYPE=KU, PARENT=STATION_ZIP_OLEDB,
    CRFILE=CITIBIKE/CITIBIKE_PARTIAL_OLEDB, CRINCLUDE=ALL, CRJOINTYPE=INNER,
    JOIN_WHERE=STATION_ID EQ START_STATION_ID;, $

The following request issues a left outer join from a larger version of the trip data file to the cluster:

SET FOCTRANSFORM = NESTED_CLUSTERS/&VALUE
SET SHORTPATH = SQL
JOIN LEFT_OUTER START_STATION_ID IN CITIBIKE_TRIPDATA TAG T1 TO ALL STATION_ID IN STATION_TRIP_CLS TAG T2 AS J1
TABLE FILE CITIBIKE_TRIPDATA
" NESTED_CLUSTERS/&VALUE"
" "
SUM CNT.T1.START_STATION_ID AS T1,Station CNT.ZIP_CODE CNT.T2.START_STATION_ID AS T2,Station
ON TABLE SET PAGE NOLEAD
ON TABLE SET STYLE *
GRID=OFF,$
ENDSTYLE
END

Running the request with &VALUE set to OFF generates the following trace:

SELECT   
   COUNT(T1."START_STATION_ID"),  
   COUNT(T2."ZIP_CODE"),  
   COUNT(T3."START_STATION_ID")  
   FROM   
   ( ( citibike_tripdata_mssqloledb T1  
   LEFT OUTER JOIN   
   station_zip_oledb T2  
   ON T2."STATION_ID" = T1."START_STATION_ID" )  
   INNER JOIN   
   citibike_partial_msoledb T3  
   ON (T3."START_STATION_ID" = T2."STATION_ID") ); 

The output is shown in the following image. The inner join was done last, reducing the number of stations in the host file to the same number as in the cluster.

Running the request with &VALUE set to ON generates the following trace. Two SQL scripts are generated, one for the host file and one for the join in the cluster. Then, the left outer join is performed against the result of the inner join:

SELECT   
   T1."START_STATION_ID" AS "SK001_START_STATION_ID",  
    COUNT(T1."START_STATION_ID") AS "VB001_CNT_START_STATION_ID"  
    FROM   
   citibike_tripdata_mssqloledb T1  
    GROUP BY   
   T1."START_STATION_ID"; 
   (FOC2546) SQL SCRIPT __CITIBIKE_TRIPDATA_OLEDB_CITIBIKE_TRIPDATA_OLEDB.SQL CREATED SUCCESSFULLY (BUT NOT EXECUTED)
   _EDATEMP/__citibike_tripdata_oledb_citibike_tripdata_oledb HELD AS SQL_SCRIPT 
    SELECT   
  T1."STATION_ID" AS "SK001_STATION_ID",  
    COUNT(T1."ZIP_CODE") AS "VB001_CNT_ZIP_CODE",  
    COUNT(T2."START_STATION_ID") AS "VB002_CNT_START_STATION_ID"  
    FROM   
   station_zip_oledb T1,  
   citibike_partial_msoledb T2  
    WHERE   
   (T2."START_STATION_ID" = T1."STATION_ID")  
    GROUP BY   
   T1."STATION_ID"; 
  (FOC2546) SQL SCRIPT __CITIBIKE_TRIPDATA_OLEDB_STATION_PARTIAL_OLEDB_CLS.SQL CREATED SUCCESSFULLY (BUT NOT EXECUTED)
  _EDATEMP/__citibike_tripdata_oledb_station_partial_oledb_cls HELD AS SQL_SCRIPT 
    SELECT   
    SUM(T1."VB001_CNT_START_STATION_ID"),  
    SUM(T2."VB001_CNT_ZIP_CODE"),  
    SUM(T2."VB002_CNT_START_STATION_ID")  
    FROM   
   (   
   ( /* vvv */   
      SELECT   
     T1."START_STATION_ID" AS "SK001_START_STATION_ID",  
      COUNT(T1."START_STATION_ID") AS  
     "VB001_CNT_START_STATION_ID"  
      FROM   
     citibike_tripdata_mssqloledb T1  
      GROUP BY   
     T1."START_STATION_ID"  
   ) /* ^^^ */ T1  
    LEFT OUTER JOIN   
   ( /* vvv */   
      SELECT   
     T1."STATION_ID" AS "SK001_STATION_ID",  
      COUNT(T1."ZIP_CODE") AS "VB001_CNT_ZIP_CODE",  
      COUNT(T2."START_STATION_ID") AS  
     "VB002_CNT_START_STATION_ID"  
      FROM   
     station_zip_oledb T1,  
     citibike_partial_msoledb T2  
      WHERE   
     (T2."START_STATION_ID" = T1."STATION_ID")  
      GROUP BY   
     T1."STATION_ID"  
   ) /* ^^^ */ T2  
    ON T2."SK001_STATION_ID" = T1."SK001_START_STATION_ID" ); 

The output is shown in the following image. The left outer join was done last, maintaining the original number of stations in the host file.