Controlling Collation Sequence

How to:

Reference:

Collation is defined as a set of rules that apply to the ordering and matching of all language elements that involve comparison of two values. A wide variety of elements are affected by this feature. Among these features are sorting, aggregation, WHERE conditions, and StyleSheets. By default, items are sorted based on their binary values. The COLLATION settings SRV_CI and SRV_CS, case-insensitive and case-sensitive collation, implement collation based on the LANGUAGE setting. Case-insensitive collation means that all WHERE clauses and sorts ignore the case of the elements being compared. COLLATION is a session level setting (it is not supported in an ON TABLE phrase and should be set in the edasprof server profile).

The collation setting applies only to alphanumeric values.

Syntax: How to Establish Binary or Case-Insensitive Collation Sequence

Add the following command to the server edasprof.prf profile:

SET COLLATION = {BINARY|SRV_CI|SRV_CS|CODEPAGE}

where:

BINARY

Bases the collation sequence on binary values.

SRV_CI

Bases collation sequence on the LANGUAGE setting, and is case-insensitive.

SRV_CS

Bases collation sequence on the LANGUAGE setting, and is case-sensitive.

CODEPAGE

Bases collation sequence on the code page in effect, and is case-sensitive. CODEPAGE is the default value.

In most cases, CODEPAGE is the same as BINARY. The only differences are for Danish, Finnish, German, Norwegian, and Swedish in an EBCDIC environment.

Reference: Usage Notes for SET COLLATION

  • SUFFIX=FIX or SUFFIX=FOCUS/XFOCUS HOLD files created in one mode may not be usable as targets for a JOIN in another mode if the join field is on alphanumeric data with mixed-cases.
  • FIXRETRIEVE is supported only for binary data, so setting COLLATION to anything other than BINARY will turn FIXRETRIEVE OFF, which may affect join performance.

Rules for Sorting and Aggregation

  • Records with the same characters in the same order, but with variations in case, are considered to be identical. If multiple input records have these variations, the value used is from the first such record.
  • In a detail level report, the sort value is the same for each output record. That value will be the one for the input record that had the lowest value (collated first).
  • When the MIN (or the MAX) value for two or more alphanumeric display fields having a given instance of the sort field values is the same by case-insensitive collation, but the two values vary by case in some positions, the one retained is the last one in the input file (highest input record number) when SUMPREF=LST and the first (lowest record number) when SUMPREF=FST.

Example: Using Binary and Case-Insensitive Collation Sequence for Sorting

The following request creates a TIBCO FOCUS® data source named COLLATE that has some records with product names that differ only by the case of one letter:

CREATE FILE COLLATE                                                
-RUN                                                               
MODIFY FILE COLLATE                                                
FIXFORM PROD_NUM/C4 PRODNAME/C30 QTY_IN_STOCK/C7 PRICE/C12 COST/C12
CHECK OFF                                                          
DATA                                                               
10042 Hd VCR LCD Menu               43068      179.00      129.00  
10052 HD VCR LCD Menu               43068      179.00      129.00  
1006Combo Player - 4 HD VCR + DVD   13527      399.00      289.00  
1007Combo Player - 4 Hd VCR + DVD   13527      399.00      289.00  
1008DVD Upgrade Unit for Cent. VCR    199      199.00      139.00  
1010750SL Digital Camcorder 300 X   10758      999.00      750.00  
1012650DL Digital Camcorder 150 X    2972      899.00      710.00  
1014340SX Digital Camera 65K P        990      249.00      199.00  
1015340SX digital Camera 65K P        990      249.00      199.00
1016330DX Digital Camera 1024K P    12707      279.00      199.00
1018250 8MM Camcorder 40 X          60073      399.00      320.00
1019250 8mm Camcorder 40 X          60073      399.00      320.00
1020150 8MM Camcorder 20 X           5961      319.00      240.00
1022120 VHS-C Camcorder 40 X         2300      399.00      259.00
1024110 VHS-C Camcorder 20 X         4000      349.00      249.00
1026AR2 35mm Camera 8 X             12444      129.00       95.00
1029AR2 35MM Camera 8 X             11499      109.00       79.00
1028AR3 35MM Camera 10 X            11499      109.00       79.00
1030QX Portable CD Player           22000      169.00       99.00
1032R5 Micro Digital Tape Recorder   1990       89.00       69.00
1034ZT Digital PDA - Commercial     21000      499.00      349.00
1036ZC Digital PDA - Standard       33000      299.00      249.00
END

The following request prints the values of PRODNAME in the order in which they are encountered in the input stream:

TABLE FILE COLLATE
PRINT PROD_NUM PRODNAME
END

On the output, the rows with product numbers 1004 and 1005 differ only in the case of the letter d in HD. The record with the lowercase d is before the record with the uppercase D. The rows with record numbers 1006 and 1007 also differ only in the case of the letter d in HD. In this case, the record with the uppercase D is before the record with the lowercase d:

Product  Product                       
Number:  Name:                         
-------  -------                       
1004     2 Hd VCR LCD Menu             
1005     2 HD VCR LCD Menu             
1006     Combo Player - 4 HD VCR + DVD 
1007     Combo Player - 4 Hd VCR + DVD 
1008     DVD Upgrade Unit for Cent. VCR
1010     750SL Digital Camcorder 300 X 
1012     650DL Digital Camcorder 150 X 
1014     340SX Digital Camera 65K P    
1015     340SX digital Camera 65K P    
1016     330DX Digital Camera 1024K P  
1018     250 8MM Camcorder 40 X        
1019     250 8mm Camcorder 40 X        
1020     150 8MM Camcorder 20 X        
1022     120 VHS-C Camcorder 40 X      
1024     110 VHS-C Camcorder 20 X      
1026     AR2 35mm Camera 8 X           
1029     AR2 35MM Camera 8 X           
1028     AR3 35MM Camera 10 X          
1030     QX Portable CD Player         
1032     R5 Micro Digital Tape Recorder
1034     ZT Digital PDA - Commercial   
1036     ZC Digital PDA - Standard

The next request sorts the output in BINARY order. The setting COLLATION = BINARY is in effect:

TABLE FILE COLLATE      
PRINT PROD_NUM          
BY PRODNAME             
END

In an EBCDIC environment, the records with the lowercase letters sort in front of the records with the uppercase letters, so the row with product number 1007 sorts in front of the row with product number 1006:

Product                         Product
Name:                           Number:
-------                         -------
AR2 35mm Camera 8 X             1026   
AR2 35MM Camera 8 X             1029   
AR3 35MM Camera 10 X            1028   
Combo Player - 4 Hd VCR + DVD   1007   
Combo Player - 4 HD VCR + DVD   1006   
DVD Upgrade Unit for Cent. VCR  1008   
QX Portable CD Player           1030   
R5 Micro Digital Tape Recorder  1032   
ZC Digital PDA - Standard       1036   
ZT Digital PDA - Commercial     1034   
110 VHS-C Camcorder 20 X        1024   
120 VHS-C Camcorder 40 X        1022   
150 8MM Camcorder 20 X          1020   
2 Hd VCR LCD Menu               1004   
2 HD VCR LCD Menu               1005   
250 8mm Camcorder 40 X          1019   
250 8MM Camcorder 40 X          1018   
330DX Digital Camera 1024K P    1016   
340SX digital Camera 65K P      1015   
340SX Digital Camera 65K P      1014   
650DL Digital Camcorder 150 X   1012   
750SL Digital Camcorder 300 X   1010

In an ASCII environment, the records with the uppercase letters sort in front of the records with the lowercase letters, so the row with product number 1005 sorts in front of the row with product number 1004:

Product                         Product 
Name:                           Number:
-------                         ------- 
110 VHS-C Camcorder 20 X        1024   
120 VHS-C Camcorder 40 X        1022   
150 8MM Camcorder 20 X          1020   
2 HD VCR LCD Menu               1005   
2 Hd VCR LCD Menu               1004   
250 8MM Camcorder 40 X          1018   
250 8mm Camcorder 40 X          1019   
330DX Digital Camera 1024K P    1016   
340SX Digital Camera 65K P      1014   
340SX digital Camera 65K P      1015   
650DL Digital Camcorder 150 X   1012   
750SL Digital Camcorder 300 X   1010   
AR2 35MM Camera 8 X             1029   
AR2 35mm Camera 8 X             1026   
AR3 35MM Camera 10 X            1028   
Combo Player - 4 HD VCR + DVD   1006   
Combo Player - 4 Hd VCR + DVD   1007   
DVD Upgrade Unit for Cent. VCR  1008   
QX Portable CD Player           1030   
R5 Micro Digital Tape Recorder  1032   
ZC Digital PDA - Standard       1036   
ZT Digital PDA - Commercial     1034

With COLLATION set to SRV_CI and a sort on the PRODNAME field, the uppercase and lowercase letters have the same value, so the row displays only once for multiple record numbers. For example, the rows with product numbers 1004 and 1005 display with the same PRODNAME value and the sort field value for the display is the first one in the input stream.

The following shows the output in an EBCDIC environment:

Product                         Product
Name:                           Number:
-------                         -------
AR2 35mm Camera 8 X             1026   
                                1029   
AR3 35MM Camera 10 X            1028   
Combo Player - 4 HD VCR + DVD   1006   
                                1007   
DVD Upgrade Unit for Cent. VCR  1008   
QX Portable CD Player           1030   
R5 Micro Digital Tape Recorder  1032   
ZC Digital PDA - Standard       1036   
ZT Digital PDA - Commercial     1034   
110 VHS-C Camcorder 20 X        1024   
120 VHS-C Camcorder 40 X        1022   
150 8MM Camcorder 20 X          1020   
2 Hd VCR LCD Menu               1004   
                                1005   
250 8MM Camcorder 40 X          1018   
250 8MM Camcorder 40 X          1019
330DX Digital Camera 1024K P    1016
340SX Digital Camera 65K P      1014
                                1015
650DL Digital Camcorder 150 X   1012
750SL Digital Camcorder 300 X   1010

The following shows the output in an ASCII environment:

Product                         Product   
Name:                           Number:   
-------                         -------    
110 VHS-C Camcorder 20 X        1024   
120 VHS-C Camcorder 40 X        1022   
150 8MM Camcorder 20 X          1020   
2 Hd VCR LCD Menu               1004   
                                1005   
250 8MM Camcorder 40 X          1018   
                                1019   
330DX Digital Camera 1024K P    1016   
340SX Digital Camera 65K P      1014   
                                1015   
650DL Digital Camcorder 150 X   1012   
750SL Digital Camcorder 300 X   1010   
AR2 35mm Camera 8 X             1026   
                                1029   
AR3 35MM Camera 10 X            1028   
Combo Player - 4 HD VCR + DVD   1006   
                                1007   
DVD Upgrade Unit for Cent. VCR  1008   
QX Portable CD Player           1030   
R5 Micro Digital Tape Recorder  1032   
ZC Digital PDA - Standard       1036   
ZT Digital PDA - Commercial     1034

Example: Using Binary and Case-Insensitive Collation Sequence for Selection

The following request against the COLLATE data source selects records in which the PRODNAME contains the characters 'HD':

TABLE FILE COLLATE          
PRINT PROD_NUM PRODNAME     
WHERE PRODNAME CONTAINS 'HD'
END

With COLLATION set to BINARY, only the records with an exact match (uppercase HD) are selected. The output is:

Product  Product                      
Number:  Name:                        
-------  -------                      
1005     2 HD VCR LCD Menu            
1006     Combo Player - 4 HD VCR + DVD

Running the same request but changing the COLLATION parameter to SRV_CI selects all records with any combination of uppercase and lowercase values for H and D. The rows are displayed in the order in which they appeared in the data source:

Product  Product                      
Number:  Name:                        
-------  -------                      
1004     2 Hd VCR LCD Menu            
1005     2 HD VCR LCD Menu            
1006     Combo Player - 4 HD VCR + DVD
1007     Combo Player - 4 Hd VCR + DVD