File Collection Merging
File merging is used for log sources that produce a large number of log files.
The file merging process concatenates multiple files into a single log file, thereby increasing the efficiency of file processing. File merging does not work for files pulled by LogLogic® Universal Collector and sent to LogLogic LMI.
File merging can be used:
- for any native file-pull rule
- when a rule execution results in many files
- for a pull of a single archive file that could be compressed such as .tar, .taz, .tar.Z, .tar.gz, .tar.bz2, .tgz
- when a pulled archive contains more than 5 files smaller than 1 MB in size. Otherwise, merging overhead becomes comparable to the overall file processing time.
You can specify archive types such as .zip or .bz2 for merging. When unpacked, the archive creates multiple small files in LogLogic LMI.
- See the list of sources for which file merging is not supported.
- Do not configure file merging when you want to leave the pulled files unmodified in any way.
To optimize file processing, the merged file should contain logs sorted on the log time stamp. It is assumed that each small file is already sorted by time. While concatenating files during merging, file merging sorts the files by file name.
The following file merging configuration parameters are specified in the fc.conf file. Whitespace is not allowed within the parameters.
The SearchKey parameter specifies rules for finding a sortable portion of the file name. When deciding on the SearchKey value, refer to the structure of the pulled archive file on the log source server, and derive the SearchKey based on original file names on the log source server.
The most obvious one is a time stamp embedded into the file name. If at the beginning of the file name, SearchKey is not needed. If in the middle of the file name, SearchKey specifies a printable character that precedes the timestamp. If several instances of this character precede the timestamp, specify a number immediately after the character. For example, for the file name file_A_B_12345678.txt, the timestamp 12345678 is preceded by an underscore. As three underscores are found before the timestamp, SearchKey=_3 makes sure the timestamp is extracted after the last underscore.
Usually, the user has no control over names of the small files. Whenever possible, the timestamp should appear at the beginning of the file name.
Sorting is performed on the portion of the file name that begins with the fragment; not the entire file name. Sorting is not mandatory; it helps to optimize log processing. After being sorted, the merged files generate a single file in which the logs are sorted. This makes file processing more efficient and the parsed data better aggregated.
- File merge works even if logs are not sorted.
- File sorting affects subsequent regex search. During regex search, the results are returned in order they are written into BFQ files.
- Sorting during merging does not affect the result of the Index search because the Index search sorts results by time.
- File parsing is not affected by sorting during file merging.
Sources that are not supported
File merging is not supported for the following sources:
- Data collected via TIBCO LogLogic® Universal Collector
- JDBC-based file pulls
- All of the file extensions: .gz, .bz2, .zip, .z, .Z
- .tar.z and .Z files: Merging of .tar.z and .Z is not supported as those files are treated purely as compressed files because of their .z and .Z extensions. LogLogic LMI checks the last extension and hence these files are not treated as archive files.