Parsing a Large Number of Records

The input for this activity is placed in a process variable and takes up memory as it is being processed. When reading a large number of records from a file, the process may consume significant machine resources. To avoid too much memory, you may want to read the input in parts, parsing and processing a small set of records before moving on to the next set of records.

This procedure is a general guideline for creating a loop group for parsing a large set of input records in parts. You may want to modify the procedure to include additional processing of the records, or you may want to change the XPath expressions to suit your business process. If processing a large number of records, do the following.

Procedure

  1. Select and drop the Parse Data activity on the process editor.
  2. On the General tab, specify the fields and select the Manually Specify Start Record check box.
  3. Select the Parse Data activity and click the group icon on the tool bar to create a group containing the Parse Data activity.
  4. Specify Repeat Until True Loop as the Group action, and specify an index name (for example, "i").
    The loop must exit when the EOF output item for the Parse Data activity is set to true. For example, the condition for the loop can be set to the following: string($ParseData/Output/done) = string(true())
  5. Set the noOfRecords input item for the Parse Data activity to the number of records you want to process for each execution of the loop.

    If you do not select the Manually Specify Start Record check box on the General tab of the Parse Data activity, the loop processes the specified noOfRecords with each iteration, until there are no more input records to parse.

    You can optionally select the Manually Specify Start Record check box to specify the startRecord on the Input tab. If you do this, you must create an XPath expression to properly specify the starting record to read with each iteration of the loop. For example, the count of records in the input starts at zero, so the startRecord input item could be set to the current value of the loop index minus one. For example, $i - 1.