Improving Efficiency With External Sorts

In this section:

How to:

Reference:

When a report is generated, by default it is sorted using an internal sorting procedure. This sorting procedure is optimized for reports of up to approximately 180 to 200K, although many factors affect the size of the data that can be handled by the internal sort.

The FOCSORT file used for the internal sort can grow to any size allowed by the operating system running and the available disk space. The user does not have to break a request up to accommodate massive files. In previous releases, the FOCSORT file was limited to 2 GB and the user received a FOC298 message when the WebFOCUS limit was exceeded. With no limit enforced by WebFOCUS, the operating system provides whatever warning and error handling it has for the management of a FOCSORT file that exceeds its limits.

You can generate larger reports somewhat faster by using dedicated sorting products, such as SyncSort, DFSORT, or, in non-Mainframe environments, the WebFOCUS external sort routines.

To use an external sort, the EXTSORT parameter must be ON. Use of a StyleSheet turns off external sorting.

Note that in Mainframe environments, external sorting is supported with the French, Spanish, German, and Scandinavian National Languages (Swedish, Danish, Finnish, and Norwegian). To specify the National Language Support Environment, use the LANG parameter as described in the TIBCO WebFOCUS® Developing Reporting Applications manual.

Reference: Requirements for External Sorting

You can use the DFSORT and SyncSort external sort products with any TABLE, FML, GRAPH, or MATCH request in all WebFOCUS Mainframe environments. In other operating environments, WebFOCUS has its own external sort routines.

Reference: Usage Notes for External Sorting in Non-Mainframe Environments

It is probably best not to use external sort if:

  • Your request requires a matrix (cannot be converted to a TABLEF request). If your request needs a matrix and uses external sort, it will go through two sorts, both external and internal, and it will be hard to realize any performance gains.

    To tell if your report is convertible to TABLEF, use ? STAT (as described in Query the Sort Type) or run an abbreviated version of the request with a low record limit and external sort on. If the report statistics are printed after the TABLE output, it was performed as TABLEF; if the statistics are printed before the first screen of TABLE output, it went through TABLE processing because it was not convertible to TABLEF.

  • Your input is sorted or almost sorted.
  • Your system cannot support a large number of work files (for information, see Sort Work Files and Return Codes). In this case the internal sort may do a better job since internally it implements about 60 logical work files, all sharing space in FOCSORT.

Procedure: How to Determine the Type of Sort Used

To determine which sort is used, the following criteria are evaluated, in this sequence:

  1. BINS. If an entire report can be sorted within the work area (BINS), the external sort is not invoked, even if EXTSORT is set ON.
  2. EXTERNAL. If BINS is not large enough to sort the entire report and EXTSORT is set ON, the external sort utility will be invoked.

Syntax: How to Control External Sorting

You can turn the external sorting feature on and off using the SET EXTSORT command.

SET EXTSORT = {ON|OFF}

where:

ON

Enables the selective use of a dedicated external sorting product to sort reports. This value is the default in all Mainframe environments.

OFF

Uses the internal sorting procedure to sort all reports. This value is the default in all non-Mainframe environments.

Syntax: How to Query the Sort Type

To determine which sort is being used for a given report, issue the following command after the report request:

? STAT

The command displays the following values for the SORT USED parameter:

FOCUS

The internal sorting procedure was used to sort the entire report.

SQL

You are using a relational data source and the RDBMS supplied data already in order.

EXTERNAL

An external sorting product sorted the report.

NONE

The report did not require sorting.

Providing an Estimate of Input Records or Report Size for Sorting

How to:

There are two advantages to providing an estimate for the input size (ESTRECORDS) or the report size (ESTLINES):

Syntax: How to Provide an Estimate of Input Records or Report Size for Sorting

ON TABLE SET ESTRECORDS nON TABLE SET ESTLINES n

where:

n

Is the estimated number of records or lines to be sorted.

Sort Work Files and Return Codes

Reference:

In non-Mainframe environments, external sorts use temporary work files to hold intermediate sorting results. For each type of external sort, you must be aware of how sort work files are created and used.

Reference: Sort Work Files on UNIX, Windows, and OpenVMS

While internal sorting uses only one work file, FOCSORT (allocated in the EDATEMP directory), external sort allows up to 31 work files, allocated on one or more disk drives (spindles) or directories.

Warning: Any one or more of these work files may become very large. Count on using many times the total disk space required by FOCSORT.

By default, five work files are allocated in the /tmp directory on UNIX, or in the directory pointed to by the TMP environment variable in Windows. This may not be enough sort work space and, even if the files fit in the directory, five files are probably not enough for optimal performance. Also, having all of the sort work files on the same disk may further degrade performance.

You have two other options:

  • Define the TMPDIR shell variable (UNIX) or TMP environment variable (Windows) to point to some suitable writable directory. For best results, this directory should be on a disk with a lot of available space, and not the same disk as the data source or the EDATEMP directory. Again, you will get five temporary work files allocated on the same spindle, with consequent performance degradation.
  • Define 1 to 31 shell variables of the form IBITMPDIR01 ... IBITMPDIR31 to point to one or more writable directories.

    If the UNIX TMPDIR or Windows TMP variable is set, it must be "unset" in order to make use of the IBITMPDIRnn variables. The UNIX command for unsetting the TMPDIR variable is:

    unset TMPDIR

    The Windows command for unsetting the TMP variable is:

    SET TMP=

    Different variables may point to the same directory, if desired. If you wish to allocate n work files, you must define variables 01 through n. The first variable missing from the environment determines the number of work files that will be used. (If you define fewer than five, additional files will be allocated using the system default location to make up the difference.) The more work files you allocate, and the more separated they are across different spindles, the better performance you should achieve. The major constraint is the total disk space available.

The work file names are generated by the ANSI tempnam function, however, the names all begin with the characters srtwk. If the sorting process ends normally or terminates because of a detectable error (typically, disk space overflow), all of the allocated work files are deleted. There is no explicit way to save them. If there is another type of abnormal termination, srtwk files may be left on the disk. You can and should erase them.

Reference: Sort Work Files on IBM i

On IBM i (formerly i5/OS), the number of work files is fixed at 9. They are virtual files.

Reference: WebFOCUS External Sort Return Codes

The WebFOCUS error message FOC909 is issued for all errors from external sort. An additional three-digit code is supplied, of which the last two digits are of interest. If you get an error number ending in:

  • 16, external sort did not have enough memory allocated. You can try reducing the number of work files.
  • 20, an I/O error occurred; in most cases, this means that one of the disks is not writable or has overflowed. Allocate the work files differently or reduce their number.
  • 28, one of the work files could not be opened. Check to make sure the pathname was specified correctly and that protections allow writing and reading.
  • 32, an internal logical error was detected in the sort processing. Report this problem to Technical Support Services.

Mainframe External Sort Utilities and Message Options

In this section:

How to:

By default, error messages created by a Mainframe external sort product are not displayed. However, you may wish to display these messages on your screen for diagnostic purposes.

Procedure: How to Select a Sort Utility and Message Options

You use the SET SORTLIB command to both specify the sort utility used at your site and, for DFSORT and SYNCSORT on z/OS, to display sort messages.

  1. Issue the SET SORTLIB command to specify the sort utility being used:
    SET SORTLIB = {sortutility|DEFAULT}

    where:

    sortutility

    Can be one of the following:

    • DFSORT for DFSORT without messages.
    • MVSMSGDF for DFSORT with messages.
    • SYNCSORT for SyncSort without messages.
    • MVSMSGSS for SyncSort with standard messages.
    • MVSMSGSD for SyncSort with debug (verbose) messages.
    • DEFAULT for DFSORT. However, It is more efficient and highly recommended that you explicitly specify the sort utility using one of the other values.
  2. If you specified a sort option that produces sort messages on z/OS, you must direct the sort messages to the batch output stream or a file.

    Allocate DDNAME SYSOUT to the batch output stream or a file on z/OS by inserting the appropriate following DD card into your server batch JCL, if it is not already there. For example, the following DD card allocates DDNAME SYSOUT to the batch output stream:

    //SYSOUT DD SYSOUT=*

Diagnosing External Sort Errors

How to:

Reference:

When an external sort generates an error, you can generate a trace of sort processing and examine the FOCUS return codes and messages to diagnose the problem.

Procedure: How to Trace Sort Processing

When an external sort problem occurs, one of the following messages is generated:

(FOC909)  CRITICAL ERROR IN EXTERNAL SORT.  RETURN CODE IS: xxxx
(FOC1810) External sort not found
(FOC1899) Load of %1 (external-sort module) under %2 failed

In response to these messages, as well as for any other problem with sorting, it is useful to trace sort processing. For information on diagnosing external sort problems, see Diagnosing External Sort Errors.

  1. Allocate DDNAME FSTRACE to the terminal or a file. The following example sends trace output to the terminal:
    //FSTRACE  DD  SYSOUT=*,DCB=(RECFM=FA,LRECL=133,BLKSIZE=133)
  2. Activate the trace by adding the following commands in any supported profile or a FOCEXEC:
    SET TRACEUSER = ON
    SET TRACEON  = SORT/1/FSTRACE

Reference: External Sort Messages and Return Codes

When you receive a FOC909 message, it includes a return code:

(FOC909)  CRITICAL ERROR IN EXTERNAL SORT.  RETURN CODE IS: xxxx

You may also receive one of the following messages:

(FOC1810) External sort not found
(FOC1899) Load of %1 (external-sort module) under %2 failed

The following notes apply when this message or a FOC1800 or FOC1899 message is generated by a TABLE request:

  • The most common value for xxxx is 16. However, return code 16 is issued for a number of problems, including but not limited to the following:
    • Syntax errors.
    • Memory shortage.
    • I/O errors (depending on installation options).
    • Space problems with output.
    • Space problems with work files.

    In order to diagnose the error, you must generate external sort messages (using the instructions in How to Select a Sort Utility and Message Options and How to Trace Sort Processing) and then reproduce the failure.

    For return codes not described below, follow the same procedure described for return code 16.

  • Return code 20 is issued by DFSORT under z/OS if messages were requested (using the MVSMSGDJ option of the SET SORTLIB command), but the SYSOUT DD card is missing. DFSORT terminates after issuing the return code. Under the same conditions, SyncSort attempts to open SYSOUT, producing the following message, and then continues with messages written to the operator or terminal:
    IEC130I SYSOUT DD STATEMENT MISSING.
  • Return code 36 or a FOC1899 message under z/OS means that the external sort module could not be found; check the STEPLIBs allocated.

When REBUILD INDEX invokes an external sort that fails, it generates a message similar to the following:

ERROR OCCURRED IN THE SORT yyyyyyyyzzzzzzzz

In this case, the return code is yyyyyyyy and it is expressed in hex. The final eight digits (zzzzzzzz) should be ignored.

Translate the return code into decimal and follow the instructions for return codes in a TABLE request.

Note also that when a TABLE request generates a non-zero return code from an external sort, FOCUS is terminated. By contrast, when REBUILD INDEX gets a non-zero return code from an external sort, the REBUILD command is terminated but FOCUS continues.

Reference: Responding to an Indication of Inadequate Sort Work Space

Before following these instructions, make sure that external sort messages were generated (for information, see How to Select a Sort Utility and Message Options) and that they clearly show that the reason for failure was inadequate sort work space.

  1. Make an estimate of the number of lines of output the request will produce.
  2. Set the ESTLINES parameter in the request or FOCEXEC. For information, see Provide an Estimate of Input Records or Report Size for Sorting.

    WebFOCUS will pass this estimate to the external sort utility through the parameter list.

    Do not override the DD cards for SORTWKnn, S001WKnn, DFSPARM, or $SORTPARM without direct instructions from technical support. The instructions in How to Select a Sort Utility and Message Options, How to Trace Sort Processing, and Providing an Estimate of Input Records or Report Size for Sorting should provide equivalent capabilities.

Aggregation by External Sort (Mainframe Environments Only)

How to:

Reference:

External sorts can be used to perform aggregation with a significant decrease in processing time in comparison to using the internal sort facility. The gains are most notable with relatively simple requests against large data sources.

When aggregation is performed by an external sort, the statistical variables &RECORDS and &LINES are equal because the external sort products do not return a line count for the answer set. This is a behavior change, and affects any code that checks the value of &LINES. (If you must test &LINES, do not use this feature.)

Syntax: How to Use Aggregation in Your External Sort

SET EXTAGGR = aggropt

where:

aggropt

Can be one of the following:

OFF disallows aggregation by an external sort.

NOFLOAT allows aggregation if there are no floating point data fields present.

ON allows aggregation by an external sort. This value is the default.

Reference: Usage Notes for Aggregating With an External Sort

  • You must use SyncSort or DFSORT.
  • Your query should be simple (that is, it should be able to take advantage of the TABLEF facility). For related information, see Data Retrieval Using TABLEF.
  • The PRINT display command may not be used in the query.
  • SET ALL must be equal to OFF.
  • Only the following column prefixes are allowed: SUM, AVG, CNT, FST.
  • Columns can be calculated values or have a row total.
  • When SET EXTAGGR = NOFLOAT and your query aggregates numeric data, the external sort is not called, and aggregation is performed through the internal sorting procedure.

Example: Changing Output by Using an External Sort for Aggregation

If you use SUM on an alphanumeric field in your report request without using an external sort, the last instance of the sorted fields is displayed in the output, by default. Turning on aggregation in the external sort displays the first record instead. However, you can control the order of display using the SUMPREFIX parameter. For information about the SUMPREFIX parameter, see Changing Retrieval Order With Aggregation.

The following command turns aggregation ON and leaves SUMPREFIX set to LST (the default) and, therefore, displays the last record:

SET EXTAGGR = ON
SET SUMPREFIX = LST
TABLE FILE CAR
SUM CAR BY COUNTRY
END

The output is:

COUNTRY     CAR     
-------     ---     
ENGLAND     TRIUMPH 
FRANCE      PEUGEOT 
ITALY       MASERATI
JAPAN       TOYOTA  
W GERMANY   BMW

Note: SUMPREFIX is described in Changing Retrieval Order With Aggregation (Mainframe Environments Only).

With SUMPREFIX = FST, the output is:

COUNTRY     CAR       
-------     ---       
ENGLAND     JAGUAR    
FRANCE      PEUGEOT   
ITALY       ALFA ROMEO
JAPAN       DATSUN    
W GERMANY   AUDI

Changing Retrieval Order With Aggregation

How to:

The SUMPREFIX parameter allows you to specify which value will be displayed when aggregating an alphanumeric or smart date field in the absence of any prefix operator. The default value is LST, which will return the physical last value within the sort group. FST will return the first physical value in the sort group. MIN and MAX return either the minimum value or maximum value within the sort group.

The SUMPREFIX command allows users to choose the answer set display order.

Syntax: How to Set Retrieval Order

SET SUMPREFIX = {FST|LST|MIN|MAX}

where:

FST

Displays the first value when alphanumeric or smart date data types are aggregated.

LST

Displays the last value when alphanumeric or smart date data types are aggregated. LST is the default value.

MIN

Displays the minimum value in the sort order set by your server code page and configuration when alphanumeric or smart date data types are aggregated.

MAX

Displays the maximum value in the sort order set by your server code page and configuration when alphanumeric or smart date data types are aggregated.

Example: Displaying the Minimum Value for an Aggregated Alphanumeric Field

The following request sets SUMPREFIX to MIN and displays the aggregated PRODUCT_CATEGORY and DAYSDELAYED values as well as the minimum, maximum, first, and last PRODUCT_CATEGORY values. In each row, the aggregated PRODUCT_CATEGORY value matches the MIN.PRODUCT_CATEGORY value. The DAYSDELAYED numeric field is not affected by the SUMPREFIX value and is aggregated.

SET SUMPREFIX = MIN
TABLE FILE WF_RETAIL_LITE
SUM PRODUCT_CATEGORY DAYSDELAYED MIN.PRODUCT_CATEGORY MAX.PRODUCT_CATEGORY 
     FST.PRODUCT_CATEGORY LST.PRODUCT_CATEGORY
BY BRAND
WHERE BRAND GT 'K' AND BRAND LT 'U'
ON TABLE SET PAGE NOLEAD
ON TABLE SET STYLE *
GRID=OFF,$
ENDSTYLE
END

The output is shown in the following image.

Creating a HOLD File With an External Sort (Mainframe Environments Only)

How to:

Reference:

You can use Mainframe external sort packages to create HOLD files, producing substantial savings in processing time. The gains are most notable with relatively simple requests against large data sources.

Syntax: How to Create HOLD Files With an External Sort

SET EXTHOLD = {OFF|ON}

where:

OFF

Disables HOLD files by an external sort.

ON

Enables HOLD files by an external sort. This value is the default.

Reference: Usage Notes for Creating a HOLD File With an External Sort

  • The default setting of EXTSORT=ON must be in effect.
  • EXTHOLD must be ON.
  • The request must contain a BY field.
  • The type of HOLD file created must be a FOCUS, XFOCUS, ALPHA, or BINARY file.
  • Your query should be simple. AUTOTABLEF analyzes a query and determines whether the combination of display commands and formatting options requires the internal matrix. In cases where it is determined that a matrix is not necessary to satisfy the query, you may avoid the extra internal costs associated with creating the matrix. The internal matrix is stored in a file or data set named FOCSORT. The AUTOTABLEF default is ON, in order to realize performance gains.
  • SET ALL must be OFF.
  • There cannot be an IF/WHERE TOTAL or BY TOTAL in the request.
  • If a request contains a SUM command, EXTAGGR must be set ON, and the only column prefixes allowed are SUM. and FST.