File Parser Activity

The File Parser activity is a process starter activity which processes data from text files to XML output.

General Tab

The File Parser activity consists of the General, Description, Advanced, Output, and Fault tabs.

On the General tab, you can specify the required parameters before you use this activity.

The following table lists the configurations in the General tab of the File Parser activity:

Field Literal Value/Module Property? Visual Diff? Description
Name No Yes The name to be displayed as the label for the activity in the process.
Configuration Resource Yes Yes The reference to the Files for Unix and Windows Resource Configuration.
Schema No Yes Schema is based on the XSD generated by the File for Unix and Windows Resource Configuration selected in Configuration Resource field. Out of multiple schemas listed, only one schema is processed at run time.
Acknowledge Mode No Yes The File Parser activity waits for a confirmation for any jobs that have been created by the ActiveMatrix BusinessWorks engine, when the Acknowledge Mode is set to Client. If a job is faulted and the confirmation is not received then the same job is re-processed at a later time. When the file parser is executed, jobs that have either not been completed or faulted are processed first prior to the processing of the next file.
The following are the options for Acknowledge Mode:
  • Auto
  • Client
By default, it is set to Auto.
Note: Delta Publishing Mode or Post Processing: Leave as is are not supported when the Client option is selected for Acknowledge Mode.
Delta Publishing Mode No Yes

When Delta Publishing mode is enabled, the file parser activity checks the input file on a pre-configured timer interval, and copies any new data to a work file, and then processes and parses the new data.

When this check box is selected, several fields are greyed out, they are:

  • Recognition Method- The default value is By file name
  • Post Processing- The default value is Leave as is
  • Pre Processing Script File
  • Pre Processing Arguments
  • Post Processing Script File
  • Post Processing Arguments
Delta Flush Interval Yes Yes

This field is available only when the Delta Publishing Mode is selected. The default value is set to 3.

In Delta Publishing mode, when there is no new data appended to an input file after a specified amount of polling, the data remaining in memory is considered as complete data and parsed.

Process File Mode No Yes The criteria for creation of jobs. In this field, the following records are available when the Delta Publishing Mode is not selected:
  • Record By Record

    In Record By Record, entire record is processed in one job.

  • File Based

    In Files Based, entire file is processed in one job. When multiple files are present, one job is processed for one file.

  • Number of Records

    In Number of Records, user can specify the number of records to be outputted in the output job and it processes those many number of records per job.

When the Delta Publishing Mode is selected the following records are available:
  • Record By Record

    In Record By Record, entire record is processed in one job.

  • Number of Records

    In Number of Records, user can specify the number of records to be outputted in the output job and it processes those many number of records per job.

Note: When the Process File Mode field is selected to Number of Records or File Based, if number of records is set to total records in a file which means only one job output is created, the entire output is stored in memory. Therefore users might consider the heap size while managing big files. Hence users must take care while processing big files and allocating memory accordingly.
Number of Records Yes Yes This field is available only when the Number of Records field is selected in the Process File Mode list. In this field, the user can specify the number of records to be outputted in the output job and it processes those many number of records per job.
Polling Intervals(seconds) Yes Yes

The amount of time in seconds until the next file scan is repeated.

Input Directory Yes Yes

The File Parser activity searches and processes the files in this directory, and then parses the files.

Note: The directories used by the plug-in cannot be shared with ActiveMatrix® Adapter for Files for Unix/Win
This directory is different from the directories specified for the Working Directory and Completion Directory fields. The input directory can have an absolute path name or a relative path name. When a relative path name is used, it is relative to the starting directory of the runtime plug-in.
Note: On UNIX, the processing directories such as the input, working, and completion are specified on the same file system. Only the input directory is scanned for files that match the criteria. The files maintained in sub folders inside the input directory would be ignored.
Recognition Method No Yes

The mechanism for finding the desired input file(s). The following options are available:

  • By file name

Processes the file that exactly matches the value given in the File Name field.

  • By Wildcard via ICU Regular Expressions

Processes the file that matches the ICU regular expression specified in the File Name field.

  • By prefix + extension

Processes the files that match the criteria that you have defined in the File Prefix and File Extension fields.

  • By trigger

Processes the files that match the criteria that you have defined in the File Prefix, File Extension, and Trigger File Extension fields.

Note:
  • When selecting the By trigger option, the activity processes the input files only after they are ready. Without this, the activity might process the files in the input directory before files are created, written, or closed by the third-party applications.
  • The file name or file prefix cannot contain path information. For details about the recognition method, see File Recognition Methods.
File Name Yes Yes

This field is available in the following cases:

  • When you select By file name from the Recognition Method list, the activity processes the file that exactly matches the value given in this field.
  • When you select By Wildcard via ICU Regular Expressions from the Recognition Method list. ICU regular expressions can be used in the File Name field.
Examples of using ICU regular expressions:
  • Prepare the following files in the input directory: text0.txt, text1.txt,..., to text10.txt.

    If the input filename is text\d\.txt, the input files named from text0.txt, text1.txt,..., to text9.txt are parsed.

    Prepare the following files in the input directory: A6.0.0.txt, A6.1.0.txt, A6.2.0.txt, A6.8.0.txt, A6.0.0.log, and A6.1.0.log.

    If the input filename is A6\.[01]\.0\.(txt|log), the input files named A6.0.0.txt, A6.1.0.txt, A6.0.0.log, and A6.1.0.log are parsed.

    Note: Wildcard is different from regular expressions and is not supported. For example, *.txt must be specified as .*\.txt in the regular expressions format.
File Prefix Yes Yes

This prefix is used to locate the input file in the input directory. Any file matching the specified criteria is processed. To activate the file prefix, select By prefix + extension or By trigger from the Recognition Method list.

File Extension Yes Yes

This field is available only when you select By prefix + extension or By trigger from the Recognition Method list.

Trigger File Extension Yes Yes

This field is available only when you select By trigger from the Recognition Method list.

Description Tab

On the Description tab, you can enter a short description for the File Parser activity.

The Visual Diff is supported for the Description tab.

Advanced Tab

The Advanced tab contains the following sections:
  • Processing
  • Processing Script
  • Encoding
The following table describes the fields in the Advanced tab of the File Parser activity.
Field Literal Value/Process Property/Module Property? Visual Diff? Description
Sequence Key No Yes

This field can contain an XPath expression that specifies which processes must run in order. Process instances with sequencing keys that evaluate to the same value are executed sequentially in the order the process instance was created.

Custom Job Id No Yes

This field can contain an XPath expression that specifies a custom ID for the process instance.

The following table describes the fields in the Processing section of the Advanced tab for the File Parser activity.

Field Literal Value/Process Property/Module Property? Visual Diff? Description
Working Directory Yes No

The File Parser activity uses this directory to process files that match the criteria. Based on the option selected in the Post Processing field, the file is either copied or moved into this directory.

If you select Leave as is from the Post Processing list, the file is copied. If you select Delete or Move to, the file is deleted or moved to the completion directory.

Note:
  • For plug-in configurations, if the files processed by the parser activity are independent of each other, parser activity can share the input, working, and completion directories. Otherwise, these directories must be unique.
  • On Unix, the processing directories such as the input, working, and completion are specified on the same file system. Only the input directory is scanned for files that match the criteria. The files maintained in sub folders inside the input directory would be ignored.
  • The directories used by the plug-in cannot be shared with ActiveMatrix® Adapter for Files for Unix/Win.
Completion Directory Yes No

This field is available only when you select Move to in the Post Processing list. After the file in the working directory is processed, it is moved to this directory.

Note:
  • On Unix, the processing directories such as the input, working, and completion are specified on the same file system. Only the input directory is scanned for files that match the criteria. The files maintained in sub folders inside the input directory would be ignored.
  • The directories used by the plug-in cannot be shared with ActiveMatrix® Adapter for Files for Unix/Win.
Progress Directory Yes No

The progress file is written in this directory. If no directory is specified in this field, the progress file is created in the directory where the plug-in is started.

Post Processing No No Specifies an action to apply to the file that is currently in the working directory after File Parser has processed the file. The available postprocessing actions are:
  • Move to

    Move the file from the Working directory to the Completion directory.

  • Delete

    Deletes the file from the Working directory.

  • Leave as is
    Deletes the file from the Working directory (since the file in the Working directory is a copy. The corresponding file in the Input directory is left as is).
    Note: Load Balancing feature does not work if Leave as is, is selected in Post Processing field. For more information see Load Balancing feature.
Add TimeStamp to File Name No No

This is an option to append date and time to the file that is moved to the completion directory. The format of the date and time is YYYYMMDDHHMMSSmm.

The following table describes the fields in the Processing Script section of the Advanced tab for the File Parser activity.
Field Literal Value/Process Property/Module Property? Visual Diff? Description
Pre Processing Script File Yes No

The name of the script that must be executed before the input file is processed. You can make changes to the input file before it is processed. Click Browse to locate the script file.

For parser activity, when using a pre-processing script that did not resolve to the associated program or executable, the file parser activity was unable to invoke successfully the pre-processing script. To avoid this issue, perform the following convention to specify the preprocessing script: command::command_exec,command_file Example: command::C:\perl\bin\perl.exe,c:\temp\script.pl

In the command_exec and command_file arguments, you must specify the absolute path.

For more information, see Pre and Post Processing Scripts.

Pre Processing Arguments Yes No

Arguments that need to be passed to the preprocessing script file. Arguments are strings and are (Optional).

Syntax: Script_filename Pre Processing Arguments

Example:

script.tcl inputFile0364.txt argument1 argument2...

The variables in the file are defined as follows:
  • script.tcl is the script filename
  • inputFile0364.txt is the name of the reprocessed file
  • argument1 is the first argument, and is followed by other arguments.

The preprocessing script file reads the input file, renames the file, makes required modifications, and writes to the original filename.

If five files are in the input directory, the plug-in runs the script five times, once for each file. The plug-in processes the files in ascending order based on their names. The plug-in sorts the files according to their names alphanumerically in ascending order. It is case sensitive, and the upper case is followed by the lower case.

For example, if the following files exist in the input directory:
1.csv
11.csv
111a.csv
22.csv
11a.csv
11b.csv
22b.csv
The plug-in processes the files in the following order:
1.csv
11.csv
111a.csv
11a.csv
11b.csv
22.csv
22b.csv

During preprocessing, when the preprocessing script finds the file unsuitable for processing, the plug-in does not process the file. The plug-in logs feedback from the preprocessing script.

Post Processing Script File Yes No

The name of the script that must be executed after the input file is processed by the plug-in. Click Browse to locate and load the script.

For parser activity, when using a post-processing script that did not resolve to the associated program or executable, the file parser activity was unable to invoke successfully the post-processing script. To avoid this issue, perform the following convention to specify the post processing script: command::command_exec,command_file Example: command::C:\perl\bin\perl.exe,c:\temp\script.pl

In the command_exec and command_file arguments, you must specify the absolute path.

For more information, see Pre and Post Processing Scripts
Post Processing Arguments Yes No
Arguments you want to pass to the postprocessing script. Arguments are strings and are optional. The sequence of arguments passed to the postprocessing script is determined as follows:
  • The argument sequence contains the name of the file, the arguments specified in the postprocessing arguments, and the status. The status succeeds if the parser processes the file successfully. The status fails if the parser has problems (for example, parsing) processing the file.
The following table describes the fields in the Encoding section of the Advanced tab for the FileParser activity.
Field Literal Value/Process Property/Module Property? Visual Diff? Description
File Content Encoding No No

Provides aliases for the following commonly used encoding for file contents:

ASCII, ISO8859-1, UTF16_BigEndian, UTF16_LittleEndian, UTF-8, Shift JIS(CP943), Shift JIS (TIBCO), EUC-JP, Big5, and Other.

Note: When an invalid or unsupported encoding string value is specified, an error is displayed at run time.
File Content Encoding Other Yes No

This field is available only when you select Other in the File Content Encoding list. For more information, see File Content Encoding

End of Line No No Select the method according to how the lines in the input file are separated.
  • System

    Uses a carriage return (new line) to mark the end of a line.

  • User Defined
    Uses custom end of line characters to mark the end of a line.
    Note: Currently, no facility is provided to distinguish custom end of line characters that are not actual characters.
  • System and User Defined

    Uses a combination of carriage returns and custom characters to mark the end of a line.

User Defined EOL Yes No
This field is available only when the End of Line field is not System. Enter the characters to mark the end of a line.
Note: When the Delimiter and the User Defined EOL fields are same, the parser activity does not differentiate between the fields. Therefore, the Delimiter and User Defined EOL fields must always be different.

Output Tab

The FileParser complex object contains the complete output of the File Parser activity. It includes the header and body complex objects.

The header complex object contains the metadata of the input file.

The following table describes the fields in the header node:

Output Item Data Type Description
fullName string The full file path of the input file.
fileName string The file name of the input file.
location string The location of the input file.
readProtected boolean Returns true if the input file is read protected.
writeProtected boolean Returns true if the input file is write protected.
size integer The size of the input file.
lastModified string The timestamp of the input file when it was last modified.
eof boolean Returns true if the FileParser Output Job contains the last record of the input file.

The fields under the body complex object depends on the schema selected in the General tab of the File Parser activity.

Fault Tab

On the Fault tab the following exceptions are available for selection:
  • FileParserException
  • RecordParserException

FileParserException generates an error and cause the activity to stop. It contains the following fields:

Field Type Description
msg string The error message description returned by the plug-in.
msgCode string The error code returned by the plug-in.
errorMessage string The error message returned by the plug-in.

RecordParserException generates an error and still allow the activity to continue. The fault is generated only when the entire record in the input file is incorrect. This is applicable only for Record By Record field in Process File Mode. It contains the following fields:

Field Type Description
msg string The error message description returned by the plug-in.
msgCode string The error code returned by the plug-in.
errorRecords string The error records returned by the plug-in.