Data Models
LogLogic LMI parses log data into a structured formats to enhance search and analysis.
Based on the log source type, you can define how to parse your data and which columns to extract.
Data models:
- define parsing rules that extract columns from your data
- define a schema for an event
- allow you to name and specify data type for extracted columns
A data model can define multiple parsing rules. Sometimes within the same source, some logs are completely different to others, and it is not practical, or even possible, to match them all with a single rule. You need a different way of parsing for each kind of log, and you can do that by defining several rules, each targeting one type of log.
If a data model has more than one parsing rule defined, then the extracted column set is the union of the column sets of all parsing rules and the additional system-defined columns. For example, create a data model and define a parsing rule, Rule1 to extract four defined columns and Rule2 to extract eight different defined columns. Now, when you run a search query on this data model, the 12 columns are displayed.
Parsing rules are applied top to bottom in the order they are defined in a data model. For example, if Rule1 matches some of your data then it is used to extract column values. If Rule1 fails to match with your data, then only Rule2 is applied, and so on. You can change the order of parsing rules.
Using data models, you can add data models using two different modes:
- Graphical mode: This is a default mode. A wizard helps you add data model and the associated rules. For details, see Adding a Data Model in Graphical Mode.
- Raw mode: This is for advanced users who understand the JSON syntax. Use JSON syntax to add a data model and associated rules. For details, see Adding a Data Model in Raw Mode.
You can switch between the modes at any time. All information associated with a data model is preserved when you switch from graphical to raw mode.
You can create a data model that defines which log source to use for parsing based on the data relevance. For multiple log sources, the order of precedence can be defined in a specified query. The system columns are event metadata. All system columns are displayed with the prefix sys_ and all columns from built-in parsers are displayed with the prefix ll_ in the Columns panel.
LogLogic LMI provides built-in data models. For a detailed list, see the Supported Log Sources list in the TIBCO LogLogic® Log Management Intelligence Release Notes.
LogLogic LMI supports the following types of parsers:
- Key-value Parser
- This parser uses simple key-value pair parsing rules to extract keys and values. The parser recognizes patterns like k1=v1, k2=v2, k3=v3. You can use key-value pair separators, for example, space, comma (,), or semi-colon (;), and key and value separators, for example, equal sign (=) or colon (:). Separators can be either one or more characters that have to be matched exactly or they can be regular expressions.
When referring to a value in a column expression, it is referred to as $<key name>. So for a key with name ‘user’ the value is referred to as $user.
Regular expressions can also be used to parse data from the beginning and ending of the event. This can be useful when parsing events that either start with or end with data that is not in the key-value pair format. If these regular expressions contain named groups, then those groups are extracted and can be used to populate columns.
It is also possible to specify the name of the last key in the data. Any data after that last key is treated as the value of that last key. This can be useful in situations where the last value in the data contains characters that might be interpreted as separators.
- Columnar Parser
- The data is extracted into different columns. This parser operates on data that is separated by a character or a sequence of characters, for example, comma, or tab. There is no keyvalue; just the value. The data from different log sources extract different columns depending on keys identified in the data. When referring to a column in a column expression, it is referred to as $<column number>. So the first column is referred to as $1, the second column is $2 and so on.
- Regex Parser
- Regular expressions (Regex) are a sequence of characters that form a search pattern, mainly for use in pattern matching with strings or string matching.
LogLogic LMI can use regular expressions for extracting columns from matched events.
Each character in a regular expression is either a meta character with its special meaning, or a regular character with its literal meaning. Together, they can be used to identify textual material of a given pattern, or process a number of instances of it that can vary from a precise equality to a very general similarity of the pattern.
LogLogic LMI supports the regular expression meta characters, based on Java regular expressions. For details, see Supported Regular Expression Characters.
Columns are extracted using either the capturing group pattern (simple parenthesis), the named capturing group pattern (?<name>), or a combination of both. When referring to a column in a column expression, when using named capturing groups the column name is that specified by the group name, preceded by “$”. When using unnamed capturing groups, the name is “$” followed by the group index. So the first unnamed group column is referred as $1, the second as $2, and so on, while a group named “user” is referred as $user. When using a combination of named and unnamed capturing groups, the named capturing group columns must be referred to by their given names rather than by "$" followed by their index.
- CEF Parser
- HP ArcSight Common Event Format (CEF) is an open log management standard. CEF defines a syntax that comprises a standard header and a variable extension, formatted as keyvalue pairs. Based on the ArcSight Extension Dictionary, the CEF header columns Version, Device Vendor, Device Product, Device Version, Signature ID, Name, and Severity are extracted into columns with their names, and expressions set to $cefVersion, $cefDeviceVendor, $cefDeviceProduct, $cefDeviceVersion, $cefSignatureID, $cefName, and $cefSeverity respectively.
The name of a column for an extension listed in the ArcSight Extension Dictionary is the full name of the extension. The name of a column for an extension that is not listed in the ArcSight Extension Dictionary is the key name as it is displayed in the data preceded with “$”.
The expressions of the non-timestamp extension columns are the CEF Key Names as defined in the ArcSight Extension Dictionary. The expressions of the timestamp extension columns are of the form ToTimestamp(<$CEF Key Name>, <proposed format>) where <proposed format> is a suggestion for the correct format to use when parsing the data.
Some extensions in the ArcSight Extension Dictionary have names that start with the asterisk (*). Since LogLogic LMI does not allow column names to start with asterisk (*), an asterisk (*) is omitted from the column name. For example, the *sourceProcessId extension is extracted into a column named sourceProcessId.
When the event was written, the pipe (|), equal sign (=), and backslash (\) characters might have been escaped by inserting a backslash (\) in front of them. The CEF parser removes the backslash (\) character, returning the data to its original form. For example, if the value of the Name header in the event is "detected a \| in message", the value of the cefName column is "detected a | in message".
- Syslog Parser
- Data conforming to the Syslog standard defined in RFC-5424 (https://tools.ietf.org/html/rfc5424) can be parsed using the Syslog Parser.
All the header fields defined in the format are extracted as is the Message component. If the log data contains Structured Data elements, those are extracted as well with the names of the resulting columns being composed of <element-name>.<key name> as shown in the following example:
<165>1 2003-10-11T22:14:15.003Z mymachine.example.com evntslog - ID47 [exampleSDID@32473 iut="3" eventSource="Application" eventID="1011"] An application event log entry
The following columns are extracted:
facility = local4; severity = notice; version = 1; timestamp = 2003-10-11 15:14:15 (if LogLogic LMI is running in the PDT time zone); hostname = mymachine.example.com; appname = evntslog; procid = <null>; msgid = ID47; exampleSDID@32473.iut = 3; exampleSDID@32473.eventSource = Application; exampleSDID@32473.eventID = 1011; msg = An application event log entry