Types of Parsers in Advanced Data Models

The following types of parsers are available in advanced data models:

Key-value Parser

This parser uses simple key-value pair parsing rules to extract keys and values. The parser recognizes patterns such as k1=v1, k2=v2, k3=v3. You can use key-value pair separators, for example, space, comma (,), or semi-colon (;), and key and value separators, for example, equal sign (=) or colon (:). Separators can be either one or more characters that have to be matched exactly or they can be regular expressions.

When referring to a value in a column expression, it is referred to as $<key name>. So, for a key with the name ‘user’ the value is referred to as $user.

Regular expressions can also be used to parse data from the beginning and ending of the event. This can be useful when parsing events that either start with or end with data that is not in the key-value pair format. If these regular expressions contain named groups, then those groups are extracted and can be used to populate columns.

You can specify the name of the last key in the data. Any data after that last key is treated as the value of that last key. This can be useful in situations where the last value in the data contains characters that might be interpreted as separators.

JSON Parser

This parser parses JSON logs and accepts valid JSON as input. The parser recognizes key-value pairs, key-object pairs, and arrays.

When referring to a key in a column expression, the column name is referred as $<key name>. When referring to array elements, the column name is referred as $<key name>_<index_of_array>, where <index_of_array> starts with 0. When referring to objects, the column name is referred as $<key name>_<object name>.

For example, for the following XML elements:

{"key1":"value1","key2":"value2"}
the column names would be:
  • key1
  • key2
You might need to change the default column names to ensure that only the supported characters are used. You can use underscore (_) and alphanumeric characters. However, a column name cannot begin with a number.

XML Parser

This parser parses XML logs, and accepts syntactically valid XML as input. The parser recognizes element nodes, text nodes, and attribute nodes.

When referring to a key in a column expression, the column name is referred as $<key name>. When referring to sibling elements, the column name is referred as $<key name>_<index_of_element>, where <index_of_element> starts with 1. When referring to XML elements, the column name is referred as $<key name>_<element_name>.

For example, for the following XML elements:

<Root>
  <child> This is child1</child>
  <child> This is child2</child>
</Root>
the column names would be:
  • Root_child_1
  • Root_child_2
You might need to change the default column names to ensure that only the supported characters are used. You can use underscore (_) and alphanumeric characters. However, a column name cannot begin with a number.

Columnar Parser

The data is extracted into different columns. This parser operates on data that is separated by a character or a sequence of characters, for example, comma or tab. There is no keyvalue; just the value. The data from different log sources extract different columns depending on keys identified in the data. When referring to a column in a column expression, it is referred to as $<column number>. So the first column is referred to as $1, the second column is $2, and so on.

You can also use regular expressions to parse data from the beginning and end of an event, for example, when parsing events that either start with or end with data that is not in columnar format. If the regular expressions contain named groups, those groups are extracted and are used to fill values in the columns.

Regex Parser

Regular expressions (Regex) are a sequence of characters that form a search pattern, mainly for use in pattern matching with strings or string matching. LogLogic LMI can use regular expressions for extracting columns from matched events.
Note: Working knowledge of regular expressions is a prerequisite.

Each character in a regular expression is either a meta character with its special meaning, or a regular character with its literal meaning. Together, they can be used to identify textual material of a given pattern, or process a number of instances of it that can vary from a precise equality to a very general similarity of the pattern.

LogLogic LMI supports the regular expression meta characters, based on Java regular expressions. For details, see Supported Regular Expression Characters.

Columns are extracted using either the capturing group pattern (simple parenthesis), the named capturing group pattern (?<name>), or a combination of both. When referring to a column in a column expression, when using named capturing groups the column name is that specified by the group name, preceded by “$”. When using unnamed capturing groups, the name is “$” followed by the group index. So the first unnamed group column is referred as $1, the second as $2, and so on, while a group named “user” is referred as $user. When using a combination of named and unnamed capturing groups, the named capturing group columns must be referred to by their given names rather than by "$" followed by their index.

CEF Parser

HP ArcSight Common Event Format (CEF) is an open log management standard. CEF defines a syntax that comprises a standard header and a variable extension, formatted as key-value pairs. Based on the ArcSight Extension Dictionary, the CEF header columns Version, Device Vendor, Device Product, Device Version, Signature ID, Name, and Severity are extracted into columns with their names, and expressions set to $cefVersion, $cefDeviceVendor, $cefDeviceProduct, $cefDeviceVersion, $cefSignatureID, $cefName, and $cefSeverity respectively.

The name of a column for an extension listed in the ArcSight Extension Dictionary is the full name of the extension. The name of a column for an extension that is not listed in the ArcSight Extension Dictionary is the key name as it is displayed in the data preceded with “$”.

The expressions of the non-timestamp extension columns are the CEF Key Names as defined in the ArcSight Extension Dictionary. The expressions of the timestamp extension columns are of the form ToTimestamp(<$CEF Key Name>, <proposed format>) where <proposed format> is a suggestion for the correct format to use when parsing the data.

Some extensions in the ArcSight Extension Dictionary have names that start with the asterisk (*). Since LogLogic LMI does not allow column names to start with asterisk (*), an asterisk (*) is omitted from the column name. For example, the *sourceProcessId extension is extracted into a column named sourceProcessId.

When the event was written, the pipe (|), equal sign (=), and backslash (\) characters might have been escaped by inserting a backslash (\) in front of them. The CEF parser removes the backslash (\) character, returning the data to its original form. For example, if the value of the Name header in the event is "detected a \| in message", the value of the cefName column is "detected a | in message".

Syslog Parser

Data conforming to the Syslog standard defined in RFC-5424 (https://tools.ietf.org/html/rfc5424) can be parsed using the Syslog Parser.
Note: The older, obsolete format described in RFC-3164 is not supported.

All the header fields defined in the format are extracted as is the Message component. If the log data contains Structured Data elements, those are extracted as well with the names of the resulting columns being composed of <element-name>.<key name> as shown in the following example:

<165>1 2003-10-11T22:14:15.003Z mymachine.example.com evntslog - ID47 [exampleSDID@32473 iut="3" eventSource="Application" eventID="1011"] An application event log entry

The following columns are extracted:

  • facility = local4;
  • severity = notice;
  • version = 1;
  • timestamp = 2003-10-11 15:14:15 (if LogLogic LMI is running in the PDT time zone);
  • hostname = mymachine.example.com;
  • appname = evntslog;
  • procid = <null>;
  • msgid = ID47;
  • exampleSDID@32473.iut = 3;
  • exampleSDID@32473.eventSource = Application;
  • exampleSDID@32473.eventID = 1011;
  • msg = An application event log entry