Parse Data
The Parse Data activity takes a text string or input from a file and processes it by converting it into a schema tree based on the specified Data Format shared resource.
General
You can use any mechanism to obtain or create a text string for processing. For example, you can use the Read File activity to obtain text from a file. You can also use this activity to specify a text file to read.
You can use this activity in various scenarios. For example, a user has a file comprising multiple lines with comma-separated values (as in data obtained from a spreadsheet) and this data has to be inserted into a database table. In such a scenario, read and parse the file into a data schema with the Parse Data activity. Then, use the JDBC Update activity to insert the data schema into a database table.
The General tab has the following fields.
Field | Module Property | Description |
---|---|---|
Name | No | The name to be displayed as the label for the activity in the process. |
Data Format | No | The Data Format shared resource to use when parsing the text input. |
Input Type | No | Specify the type of input for this activity.
Input can either be String or File. If the input is a text string, provide the string to the text input item. If the input is a file, provide the file name and location to the fileName input item. |
Encoding | Yes | The encoding of the input file.
To enable this field, select the File option in the Input Type field . Any valid Java encoding name can be used. |
Skip Blank Spaces | No | Select this check box to skip any empty records when parsing the text input.
When this check box is not selected, parsing stops at the first blank line encountered in the input. |
Manually Specify Start Record | No | You can specify the record in the input where you want to start parsing.
This is useful if you have a large number of records and you want to read the input in parts (to minimize memory usage). Selecting this check box displays the startRecord input item. For more information about how to read the input stream in parts, see Parsing a Large Number of Records. |
Strict Validation | No | Validates every input line for the specified number of fields for the fixed format text.
For example, if the format states that there are three fields per line and this check box is selected, all lines in the input must contain three fields. |
Continue On Error | No | Continues parsing the next record in the input after encountering an error, if any.
If an error occurs, the error information is separated from the output of the successfully parsed records and is provided in the output schema of the activity. When this check box is not selected, the Parse Data activity quits parsing if an error is encountered while parsing the records in the input. Irrespective of whether this check box is selected or not, the Parse Data activity quits when any data validation errors occur. |
Input
The following is the input for the activity.
Input Item | Datatype | Description |
---|---|---|
text | string | The text string to parse.
This input item is available only when String is specified in the Input Type field of the General tab. |
fileName | string | The location and name of the file to read. The file's content is used as the input text string for this activity.
This input item is available only when File is specified in the Input Type field of the General tab. |
startRecord | number | The line number of the input stream to begin parsing. All lines before the specified line are ignored. This input item is available only if the
Manual Specify Start Record check box on the
General tab is selected.
The input stream begins with the line number 1 (one). This is useful for reading the input stream in parts to minimize memory usage. For more information, see Parsing a Large Number of Records |
noOfRecords | number | The number of records to read from the input stream. Specify
-1 if you want to read all records in the input stream.
This is useful for reading the input stream in parts to minimize memory usage. For more information, see Parsing a Large Number of Records. |
SkipHeaderCharacters | integer | The number of characters to skip when parsing. You can skip over any file headers or other unwanted information. |
Output
The following is the output of the activity.
Output item | Datatype | Description |
---|---|---|
Rows | complex | This output item contains the list of parsed lines from the input. This is useful to determine the number of records parsed by this activity.
The schema specified by the Data Format resource is contained in this output item. |
schema | complex | The schema containing the data from the parsed input text. This output item contains zero or more parsed records. |
ErrorRows | This output item is available when you select
Continue on Error, and error(s) while parsing the records in the input.
Raw input data is put in the error string. This field contains the list of error lines for the records from the input that failed parsing. |
|
done | boolean |
true if no more records are available for parsing.
false if there are more records available.
This output item is useful to check whether there are no more records in the input stream when reading the input in parts to preserve memory. |
Fault
The Fault tab lists the possible exceptions generated by this activity. For more information about error codes and the corrective actions to take, see the TIBCO ActiveMatrix BusinessWorks™ Error Codes guide.
Parsing a Large Number of Records
The input for this activity is placed in a process variable and takes up memory as it is being processed. When reading a large number of records from a file, the process may consume significant machine resources. To avoid too much memory, you may want to read the input in parts, parsing and processing a small set of records before moving on to the next set of records.
- Select and drop the Parse Data activity on the process editor.
- On the General tab, specify the fields and select the Manually Specify Start Record check box.
- Select the Parse Data activity and click the group icon on the tool bar to create a group containing the Parse Data activity.
- Specify
Repeat Until True Loop as the Group action, and specify an index name (for example, "i").
The loop must exit when the EOF output item for the Parse Data activity is set to true. For example, the condition for the loop can be set to the following:
string($ParseData/Output/done) = string(true())
- Set the
noOfRecords input item for the
Parse Data activity to the number of records you want to process for each execution of the loop.
If you do not select the Manually Specify Start Record check box on the General tab of the Parse Data activity, the loop processes the specified noOfRecords with each iteration, until there are no more input records to parse.
You can optionally select the Manually Specify Start Record check box to specify the startRecord on the Input tab. If you do this, you must create an XPath expression to properly specify the starting record to read with each iteration of the loop. For example, the count of records in the input starts at zero (0), so the startRecord input item could be set to the current value of the loop index minus one. For example, $i - 1.