File Collection Merging

File merging is used for log sources that produce a large number of log files.

The file merging process concatenates multiple files into a single log file, thereby increasing the efficiency of file processing. File merging does not work for files pulled by LogLogic® Universal Collector and sent to LogLogic LMI.

File merging can be used:

  • for any native file-pull rule
  • when a rule execution results in many files
  • for a pull of a single archive file that could be compressed such as .tar, .taz, .tar.Z, .tar.gz, .tar.bz2, .tgz
  • when a pulled archive contains more than 5 files smaller than 1 MB in size. Otherwise, merging overhead becomes comparable to the overall file processing time.

You can specify archive types such as .zip or .bz2 for merging. When unpacked, the archive creates multiple small files in LogLogic LMI.

Note:
  • See the list of sources for which file merging is not supported.
  • Do not configure file merging when you want to leave the pulled files unmodified in any way.

To optimize file processing, the merged file should contain logs sorted on the log time stamp. It is assumed that each small file is already sorted by time. While concatenating files during merging, file merging sorts the files by file name.

The following file merging configuration parameters are specified in the fc.conf file. Whitespace is not allowed within the parameters.

Keyword Description Example
Merge=<FileNamePattern> Specifies file name prefix for the small files so that files that match the prefix pattern are merged. The value of this parameter must be a literal string. Merge=filedata_

Selects files such as filedata_1.txt and filedata_2.txt for merging

Merge=* The wildcard * indicates that all files should be merged  
Rule=<RuleName> Optional. Limits the merge operation to a specific file pull rule. Rule=ruleA
Note:
  • If Rule= is not present, the merge applies to all pull rules on LogLogic LMI.
  • Only one rule can be specified. If more than one rule should use the same merging, the Merge=/ Rule=/SearchKey= config block should be duplicated in the configuration file for each file-pull rule.
  • Whitespace characters are not allowed in the rule name.
SearchKey=<SortableFieldSearchPattern> Optional.

Specifies rules to find a sortable portion of the file name to be used for sorting all text files in the /loglogic/data/filecollector/archiver folder, which are unpacked from the .tar file. Doing so helps simplify the processing of the merged file.

If the archiver file is 19_192.168.1.10_29309_1461094448_7.txt, then SearchKey=_3 skips the underscore character three times to find the timestamp 1461094448 in the filename.

The SearchKey parameter specifies rules for finding a sortable portion of the file name. When deciding on the SearchKey value, refer to the structure of the pulled archive file on the log source server, and derive the SearchKey based on original file names on the log source server.

The most obvious one is a time stamp embedded into the file name. If at the beginning of the file name, SearchKey is not needed. If in the middle of the file name, SearchKey specifies a printable character that precedes the timestamp. If several instances of this character precede the timestamp, specify a number immediately after the character. For example, for the file name file_A_B_12345678.txt, the timestamp 12345678 is preceded by an underscore. As three underscores are found before the timestamp, SearchKey=_3 makes sure the timestamp is extracted after the last underscore.

Usually, the user has no control over names of the small files. Whenever possible, the timestamp should appear at the beginning of the file name.

Sorting is performed on the portion of the file name that begins with the fragment; not the entire file name. Sorting is not mandatory; it helps to optimize log processing. After being sorted, the merged files generate a single file in which the logs are sorted. This makes file processing more efficient and the parsed data better aggregated.

Note:
  • File merge works even if logs are not sorted.
  • File sorting affects subsequent regex search. During regex search, the results are returned in order they are written into BFQ files.
  • Sorting during merging does not affect the result of the Index search because the Index search sorts results by time.
  • File parsing is not affected by sorting during file merging.

Configuring File Collection Merging

Procedure

  1. Create the file /loglogic/conf/fc.conf and add the following line in the file:
    Merge=* Rule=RuleName SearchKey=_3
  2. Run the command:
    $mtask -s engine_filecollector restart

Sources that are not supported

File merging is not supported for the following sources:

  • Data collected via TIBCO LogLogic® Universal Collector
  • JDBC-based file pulls
  • All of the file extensions: .gz, .bz2, .zip, .z, .Z
  • .tar.z and .Z files: Merging of .tar.z and .Z is not supported as those files are treated purely as compressed files because of their .z and .Z extensions. LogLogic LMI checks the last extension and hence these files are not treated as archive files.