Types of MATCH Processing

How to:

Controlling MATCH Processing
Specify Merge Phrases

Reference:

Usage Notes for Match Requests
Merge Phrases

There are two types of MATCH processing, grouped and ungrouped. Grouped processing is the newer type of MATCH processing, and is the processing used by default. Ungrouped processing is the legacy MATCH processing. If you need to invoke legacy processing, you can use the SET MATCHCOLUMNORDER = UNGROUPED command.

In all MATCH requests:

The two sides of the merge, the OLD and the NEW, are matched together based on their common high-order BY fields. In order for actual matching to take place between the OLD and NEW files, the high-order BY fields have to be the same. If there are no common high-order BY fields, the records are concatenated on a record-by-record basis.
The output selected from the OLD and NEW sides is based on the AFTER MATCH command.

The output stage of the MATCH differs for grouped and ungrouped processing.

With grouped processing, the output file has each common sort key (BY field) followed its display fields in the order specified in the request. As a result, ungrouped processing is limited to flat file output.
In contrast, with grouped processing, fields in the request are grouped with their highest common sort fields in the output file. This enables you to generate multi-segment hierarchical output files in FOCUS or XFOCUS format by using multiple display commands on each side of the merge. In fact, you can create the output file using any format that has a corresponding Master File.

For standard MATCH requests that use the same sort keys on both sides of the MATCH (OLD and NEW), grouped and ungrouped processing produce the same output.

However, in requests that use multiple display commands or differing sort fields on each side of the merge, the grouping of fields with their sort keys can produce output files in which the field order is different from the legacy processing. In any case where the new behavior generates output that is different from previous results and not desireable, the SET MATCHCOLUMNORDER command is available to return the legacy results.

For example, if you use MATCH to create output that includes a list of products with columns of aggregations based on differing sorts, MATCHCOLUMNORDER=UNGROUPED will ensure that the sequence of the column output will remain what it was in the past.

The way MATCH merges data depends on the order in which you name data sources in the request, the BY fields, display commands, the type of processing, and the merge phrases you use. In general, however, processing is as follows:

MATCH retrieves requested records from the first data source you name, and writes them to a temporary work area.
MATCH retrieves requested records from the second data source you name, and writes them to a temporary work area.
It compares the common high-order sort fields from the retrieved records as specified in the merge phrase (for example, OLD-OR-NEW). For more information, see Merge Phrases.
If the default grouped processing is in effect, it may re-order the fields to group them under their common sort fields.
It writes the merged results of the comparison to a temporary data source (if there are more MATCH operations). It cycles through all data sources named until END is encountered.
It writes final records to the HOLD file.

Syntax: How to Controlling MATCH Processing

SET MATCHCOLUMNORDER = {GROUPED|UNGROUPED}

where:

GROUPED: Groups fields in the output file under their common high-order sort fields. This is the default value.
UNGROUPED: Does not group fields in the output file with their common hig-order sort fields, but lays them out as specified in the MATCH request.

Reference: Usage Notes for Match Requests

With ungrouped processing, you cannot specify a format for the HOLD file generated by MATCH. It will be created as a single-segment BINARY or ALPHA HOLD file, depending on the value of the HOLDFORMAT parameter. The merge process does not change the original data sources.
Alias names are assigned sequentially (E01, E02, ...) in the HOLD Master File that results from the MATCH request. When the same field name is used mutliple times in the MATCH, users distinguish between them in requests against the HOLD file by referencing these alias names instead of the field names.
With grouped processing, fields are rearranged in the Master File, and this causes the alias names to represent different fields from the same alias names assigned with ungrouped processing. This can produce different results if you switch from one type of processing to the other.

To avoid using alias names, use the AS phrase in your MATCH request to create distinct field names (except for the common high-order BY fields, which have to be the same), and use those field names in requests against the HOLD file.
The ACROSS, BY HIGHEST/LOWEST n, IN-GROUPS-OF, WHERE TOTAL, and IF TOTAL phrases, and the COMPUTE command, are not permitted in a MATCH request. You can, however, use the DEFINE command.
Up to 128 BY phrases and the maximun number of display fields can be used in each MATCH request. The count of sort sets includes the number of common sort fields. The maximum number of display fields is determined by a combination of factors.
For details, see Displaying Report Data.
You must specify at least one BY field for each file used in the MATCH request.
When used with MATCH, the SET HOLDLIST parameter behaves as if HOLDLIST were set to ALL.
The following prefix operators are not supported in MATCH requests: DST., DST.CNT., RNK., ST., and CT.

Example: Merging Data Sources

In the following request, the high-order sort field is the same for both files, so the result is the same using grouped and ungrouped processing.

MATCH FILE EDUCFILE
SUM COURSE_CODE
BY EMP_ID
RUN
FILE EMPLOYEE
SUM LAST_NAME AND FIRST_NAME
BY EMP_ID BY CURR_SAL
AFTER MATCH HOLD OLD-OR-NEW
END
-******************************
-*  PRINT CONTENTS OF HOLD FILE
-******************************
TABLE FILE HOLD
PRINT *
END

The merge phrase used in this example was OLD-OR-NEW. This means that records from both the first (old) data source plus the records from the second (new) data source appear in the HOLD file.

The output is:

EMP_ID     COURSE_CODE         CURR_SAL  LAST_NAME        FIRST_NAME
------     -----------         --------  ---------        ----------
071382660  101               $11,000.00  STEVENS          ALFRED    
112847612  103               $13,200.00  SMITH            MARY      
117593129  203               $18,480.00  JONES            DIANE     
119265415  108                $9,500.00  SMITH            RICHARD   
119329144                    $29,700.00  BANNING          JOHN      
123764317                    $26,862.00  IRVING           JOAN      
126724188                    $21,120.00  ROMANS           ANTHONY   
212289111  103                     $.00                             
219984371                    $18,480.00  MCCOY            JOHN      
315548712  108                     $.00                             
326179357  301               $21,780.00  BLACKWOOD        ROSEMARIE 
451123478  101               $16,100.00  MCKNIGHT         ROGER     
543729165                     $9,000.00  GREENSPAN        MARY      
818692173  302               $27,062.00  CROSS            BARBARA

Example: Comparing Grouped and Ungrouped Processing

The following MATCH request has two SUM commands and one PRINT command for each file, with all sort fields common to both files. The SET MATCHCOLUMNORDER = UNGROUPED command is issued to invoke legacy processing.

SET MATCHCOLUMNORDER = UNGROUPED
MATCH FILE GGSALES
SUM DOLLARS
BY ST 
SUM BUDDOLLARS BY ST BY CITY
PRINT UNITS BY ST BY CITY BY CATEGORY
RUN
FILE GGSALES
SUM DOLLARS
BY ST
SUM BUDDOLLARS BY ST BY CITY
PRINT BUDUNITS BY ST BY CITY BY CATEGORY
AFTER MATCH HOLD OLD-OR-NEW
END
TABLE FILE HOLD 
PRINT *
ON TABLE SET PAGE NOLEAD
ON TABLE SET STYLE *
TYPE=REPORT, GRID=OFF, SIZE=9,$
ENDSTYLE
END

The HOLD Master File follows. Since the sort fields are common to both files, the two files were merged based on those fields. However, note that the order of fields in the Master File follows the order in the request, the highest-level sort field followed by its display field, then the next sort field followed by its display fields, and so on.

FILENAME=HOLD, SUFFIX=FIX     , IOTYPE=BINARY, $
  SEGMENT=HOLD, SEGTYPE=S1, $
    FIELDNAME=ST, ALIAS=E01, USAGE=A02, ACTUAL=A04, $
    FIELDNAME=DOLLARS, ALIAS=E02, USAGE=I08, ACTUAL=I04, $
    FIELDNAME=CITY, ALIAS=E03, USAGE=A20, ACTUAL=A20, $
    FIELDNAME=BUDDOLLARS, ALIAS=E04, USAGE=I08, ACTUAL=I04, $
    FIELDNAME=CATEGORY, ALIAS=E05, USAGE=A11, ACTUAL=A12, $
    FIELDNAME=UNITS, ALIAS=E06, USAGE=I08, ACTUAL=I04, $
    FIELDNAME=DOLLARS, ALIAS=E07, USAGE=I08, ACTUAL=I04, $
    FIELDNAME=BUDDOLLARS, ALIAS=E08, USAGE=I08, ACTUAL=I04, $
    FIELDNAME=BUDUNITS, ALIAS=E09, USAGE=I08, ACTUAL=I04, $

The partial output is shown in the following image.

Changing the UNGROUPED setting to GROUPED produces the following Master File. The fields that have the same common sort fields from both files are moved to be under those sort fields in the Master File.

FILENAME=HOLD, SUFFIX=FIX     , IOTYPE=BINARY, $
  SEGMENT=HOLD, SEGTYPE=S1, $
    FIELDNAME=ST, ALIAS=E01, USAGE=A02, ACTUAL=A04, $
    FIELDNAME=DOLLARS, ALIAS=E02, USAGE=I08, ACTUAL=I04, $
    FIELDNAME=DOLLARS, ALIAS=E03, USAGE=I08, ACTUAL=I04, $
    FIELDNAME=CITY, ALIAS=E04, USAGE=A20, ACTUAL=A20, $
    FIELDNAME=BUDDOLLARS, ALIAS=E05, USAGE=I08, ACTUAL=I04, $
    FIELDNAME=BUDDOLLARS, ALIAS=E06, USAGE=I08, ACTUAL=I04, $
    FIELDNAME=CATEGORY, ALIAS=E07, USAGE=A11, ACTUAL=A12, $
    FIELDNAME=UNITS, ALIAS=E08, USAGE=I08, ACTUAL=I04, $
    FIELDNAME=BUDUNITS, ALIAS=E09, USAGE=I08, ACTUAL=I04, $

The partial output is shown in the following image.

For the request that uses the GROUPED value for MATCHCOLUMNORDER, you can change the HOLD command to produce a FORMAT FOCUS output file, as follows.

AFTER MATCH HOLD FORMAT FOCUS OLD-OR-NEW

The following hierarchical multi-segment Master File is generated.

FILENAME=HOLD    , SUFFIX=FOC     , $
  SEGMENT=SEG01, SEGTYPE=S1, $
    FIELDNAME=ST, ALIAS=E01, USAGE=A02,
      TITLE='State', DESCRIPTION='State', $
    FIELDNAME=DOLLARS, ALIAS=E02, USAGE=I08,
      TITLE='Dollar Sales', DESCRIPTION='Total dollar amount of reported sales', $
    FIELDNAME=DOLLARS, ALIAS=E03, USAGE=I08,
      TITLE='Dollar Sales', DESCRIPTION='Total dollar amount of reported sales', $
  SEGMENT=SEG02, SEGTYPE=S1, PARENT=SEG01, $
    FIELDNAME=CITY, ALIAS=E04, USAGE=A20,
      TITLE='City', DESCRIPTION='City', $
    FIELDNAME=BUDDOLLARS, ALIAS=E05, USAGE=I08,
      TITLE='Budget Dollars', DESCRIPTION='Total sales quota in dollars', $
    FIELDNAME=BUDDOLLARS, ALIAS=E06, USAGE=I08,
      TITLE='Budget Dollars', DESCRIPTION='Total sales quota in dollars', $
  SEGMENT=SEG03, SEGTYPE=S2, PARENT=SEG02, $
    FIELDNAME=CATEGORY, ALIAS=E07, USAGE=A11,
      TITLE='Category', DESCRIPTION='Product category', $
    FIELDNAME=FOCLIST, ALIAS=E08, USAGE=I5, $
    FIELDNAME=UNITS, ALIAS=E09, USAGE=I08,
      TITLE='Unit Sales', DESCRIPTION='Number of units sold', $
  SEGMENT=SEG04, SEGTYPE=S1, PARENT=SEG03, $
    FIELDNAME=FOCLIST, ALIAS=E10, USAGE=I5, $
    FIELDNAME=BUDUNITS, ALIAS=E11, USAGE=I08,
      TITLE='Budget Units', DESCRIPTION='Number of units budgeted', $

Reference: Merge Phrases

MATCH logic depends on the concept of old and new data sources. Old refers to the first data source named in the request, and new refers to the second data source. The result of each merge creates a HOLD file until the END command is encountered.

The following diagram illustrates the general merge process:

Syntax: How to Specify Merge Phrases

AFTER MATCH HOLD [AS 'name'] mergetype

where:

AS 'name'

Specifies the name of the extract data source created by the MATCH command. The default is HOLD.

mergetype

Specifies how the retrieved records from the files are to be compared.

The results of each phrase are graphically represented using Venn diagrams. In the diagrams, the left circle represents the old data source, the right circle represents the new data source, and the shaded areas represent the data that is written to the HOLD file.

OLD-OR-NEW specifies that all records from both the old data source and the new data source appear in the HOLD file. This is the default if the AFTER MATCH line is omitted.

OLD-AND-NEW specifies that records that appear in both the old and new data sources appear in the HOLD file. (The intersection of the sets.)

OLD-NOT-NEW specifies that records that appear only in the old data source appear in the HOLD file.

NEW-NOT-OLD specifies that records that appear only in the new data source appear in the HOLD file.

OLD-NOR-NEW specifies that only records that are in the old data source but not in the new data source, or in the new data source but not in the old, appear in the HOLD file (the complete set of non-matching records from both data sources).

OLD specifies that all records from the old data source, and any matching records from the new data source, are merged into the HOLD file.

NEW specifies that all records from the new data source, and any matching records from the old data source, are merged into the HOLD file.