Data Models

LogLogic LMI parses log data into a structured formats to enhance search and analysis. Based on the log source type, you can define how to parse your data and which columns to extract.

Functions of Data Models

Using data models you can:

  • Define parsing rules that extract columns from your data.
  • Define a schema for an event.
  • Name and specify data type for extracted columns.

Modes to Add a Data Model

Using data models, you can add data models using two different modes:

Mode Description For more information
Graphical mode This is the default mode. A wizard helps you add data models and the associated rules. Adding a Data Model in Graphical Mode
Raw mode This is for advanced users who understand the JSON syntax. Use JSON syntax to add a data model and associated rules. Adding a Data Model in Raw Mode

You can switch between the modes at any time. All information associated with a data model is preserved when you switch from graphical to raw mode.

You can create a data model that defines which log source to use for parsing based on the data relevance. For multiple log sources, the order of precedence can be defined in a specified query. The system columns are event metadata. All system columns are displayed with the prefix sys_ and all columns from built-in parsers are displayed with the prefix ll_ in the Columns panel.

LogLogic LMI provides built-in data models. For the list of built-in data models, see the Supported Log Sources list in the TIBCO LogLogic® Log Source Packages Installation and Upgrade Guide, which is available on the TIBCO eDelivery website or TIBCO Support website after logging in.

Functions of Parsing Rules

A data model can be associated with multiple parsing rules. Sometimes within the same source, some logs are completely different to others, and it is not practical, or even possible, to match them all with a single rule. You need a different way of parsing for each kind of log, and you can do that by defining several rules, each targeting one type of log.

If a data model has more than one parsing rule defined, then the extracted column set is the union of the column sets of all parsing rules and the additional system-defined columns. For example, create a data model and define a parsing rule, Rule1, to extract four defined columns and Rule2, to extract eight different defined columns. Now, when you run a search query on this data model, the 12 columns are displayed.

Parsing rules are applied top to bottom in the order they are defined in a data model. For example, if Rule1 matches some of your data then it is used to extract column values. If Rule1 fails to match with your data, then only Rule2 is applied, and so on. You can change the order of parsing rules.

Types of Parsers

LogLogic LMI supports the following types of parsers:

Key-value Parser

This parser uses simple key-value pair parsing rules to extract keys and values. The parser recognizes patterns such as k1=v1, k2=v2, k3=v3. You can use key-value pair separators, for example, space, comma (,), or semi-colon (;), and key and value separators, for example, equal sign (=) or colon (:). Separators can be either one or more characters that have to be matched exactly or they can be regular expressions.

When referring to a value in a column expression, it is referred to as $<key name>. So, for a key with the name ‘user’ the value is referred to as $user.

Regular expressions can also be used to parse data from the beginning and ending of the event. This can be useful when parsing events that either start with or end with data that is not in the key-value pair format. If these regular expressions contain named groups, then those groups are extracted and can be used to populate columns.

It is also possible to specify the name of the last key in the data. Any data after that last key is treated as the value of that last key. This can be useful in situations where the last value in the data contains characters that might be interpreted as separators.

JSON Parser

This parser parses JSON logs and accepts valid JSON as input. The parser recognizes key-value pairs, key-object pairs, and arrays.

When referring to a key in a column expression, the column name is referred as $<key name>. When referring to array elements, the column name is referred as $<key name>_<index_of_array>, where <index_of_array> starts with 0. When referring to objects, the column name is referred as $<key name>_<object name>.

For example, for the following XML elements:

{"key1":"value1","key2":"value2"}
the column names would be:

key1

key2

You might need to change the default column names to ensure that only the supported characters are used. You can use underscore (_) and alphanumeric characters. However, a column name cannot begin with a number.

XML Parser

This parser parses XML logs, and accepts syntactically valid XML as input. The parser recognizes element nodes, text nodes, and attribute nodes.

When referring to a key in a column expression, the column name is referred as $<key name>. When referring to sibling elements, the column name is referred as $<key name>_<index_of_element>, where <index_of_element> starts with 1. When referring to XML elements, the column name is referred as $<key name>_<element_name>.

For example, for the following XML elements:

<Root>
  <child> This is child1</child>
  <child> This is child2</child>
</Root>
the column names would be:

Root_child_1

Root_child_2

You might need to change the default column names to ensure that only the supported characters are used. You can use underscore (_) and alphanumeric characters. However, a column name cannot begin with a number.

Columnar Parser

The data is extracted into different columns. This parser operates on data that is separated by a character or a sequence of characters, for example, comma or tab. There is no keyvalue; just the value. The data from different log sources extract different columns depending on keys identified in the data. When referring to a column in a column expression, it is referred to as $<column number>. So the first column is referred to as $1, the second column is $2, and so on. You can also use regular expressions to parse data from the beginning and end of an event, for example, when parsing events that either start with or end with data that is not in columnar format. If the regular expressions contain named groups, those groups are extracted and are used to fill values in the columns.

Regex Parser

Regular expressions (Regex) are a sequence of characters that form a search pattern, mainly for use in pattern matching with strings or string matching. LogLogic LMI can use regular expressions for extracting columns from matched events.
Note: Working knowledge of regular expressions is a prerequisite.

Each character in a regular expression is either a meta character with its special meaning, or a regular character with its literal meaning. Together, they can be used to identify textual material of a given pattern, or process a number of instances of it that can vary from a precise equality to a very general similarity of the pattern.

LogLogic LMI supports the regular expression meta characters, based on Java regular expressions. For details, see Supported Regular Expression Characters.

Columns are extracted using either the capturing group pattern (simple parenthesis), the named capturing group pattern (?<name>), or a combination of both. When referring to a column in a column expression, when using named capturing groups the column name is that specified by the group name, preceded by “$”. When using unnamed capturing groups, the name is “$” followed by the group index. So the first unnamed group column is referred as $1, the second as $2, and so on, while a group named “user” is referred as $user. When using a combination of named and unnamed capturing groups, the named capturing group columns must be referred to by their given names rather than by "$" followed by their index.

CEF Parser

HP ArcSight Common Event Format (CEF) is an open log management standard. CEF defines a syntax that comprises a standard header and a variable extension, formatted as key-value pairs. Based on the ArcSight Extension Dictionary, the CEF header columns Version, Device Vendor, Device Product, Device Version, Signature ID, Name, and Severity are extracted into columns with their names, and expressions set to $cefVersion, $cefDeviceVendor, $cefDeviceProduct, $cefDeviceVersion, $cefSignatureID, $cefName, and $cefSeverity respectively.

The name of a column for an extension listed in the ArcSight Extension Dictionary is the full name of the extension. The name of a column for an extension that is not listed in the ArcSight Extension Dictionary is the key name as it is displayed in the data preceded with “$”.

The expressions of the non-timestamp extension columns are the CEF Key Names as defined in the ArcSight Extension Dictionary. The expressions of the timestamp extension columns are of the form ToTimestamp(<$CEF Key Name>, <proposed format>) where <proposed format> is a suggestion for the correct format to use when parsing the data.

Some extensions in the ArcSight Extension Dictionary have names that start with the asterisk (*). Since LogLogic LMI does not allow column names to start with asterisk (*), an asterisk (*) is omitted from the column name. For example, the *sourceProcessId extension is extracted into a column named sourceProcessId.

When the event was written, the pipe (|), equal sign (=), and backslash (\) characters might have been escaped by inserting a backslash (\) in front of them. The CEF parser removes the backslash (\) character, returning the data to its original form. For example, if the value of the Name header in the event is "detected a \| in message", the value of the cefName column is "detected a | in message".

Syslog Parser

Data conforming to the Syslog standard defined in RFC-5424 (https://tools.ietf.org/html/rfc5424) can be parsed using the Syslog Parser.
Note: The older, obsolete format described in RFC-3164 is not supported.

All the header fields defined in the format are extracted as is the Message component. If the log data contains Structured Data elements, those are extracted as well with the names of the resulting columns being composed of <element-name>.<key name> as shown in the following example:

<165>1 2003-10-11T22:14:15.003Z mymachine.example.com evntslog - ID47 [exampleSDID@32473 iut="3" eventSource="Application" eventID="1011"] An application event log entry 
          

The following columns are extracted:

  • facility = local4;
  • severity = notice;
  • version = 1;
  • timestamp = 2003-10-11 15:14:15 (if LogLogic LMI is running in the PDT time zone);
  • hostname = mymachine.example.com;
  • appname = evntslog;
  • procid = <null>;
  • msgid = ID47;
  • exampleSDID@32473.iut = 3;
  • exampleSDID@32473.eventSource = Application;
  • exampleSDID@32473.eventID = 1011;
  • msg = An application event log entry
Related tasks