Aggregation Rules

You can define aggregation rules to optimize the performance of aggregation (GROUP BY) queries.

After you create and enable aggregation rules, the system precomputes the aggregations as and when log events arrive in the system. The aggregations are used to compute search query results. As time progresses, the aggregations are precomputed at real time, providing query results much faster than the queries that were not optimized. Such optimized aggregation queries can be vital for creating responsive dashboards.

Note: To use aggregation rules, Advanced Features and Advanced Aggregation must be enabled. Contact your administrator to enable the features.

An aggregation rule is defined as a regular EQL or SQL query that contains a GROUP BY statement and aggregated projections. For details, see GROUP BY Statement.

For example, consider the data model for BW process information. This data model has columns such as Domain, AppSpace, Application, Process, Activity, exectionTime, success, and failure.

Consider the aggregation query as:
USE BWProcesses | GROUP BY Domain, AppSpace,  Application, Process COLUMNS avg(executionTime), max(failure), min(success)
For this search query, the aggregation rule maintains the aggregate metrics avg(executionTime), max(failure), min(success) for each combination of the GROUP BY fields: Domain, AppSpace, Application, and Process. The computation happens on every relevant log event ingested in the system at real time.

When you use an aggregation query for which there is a matching aggregation rule, the query is a candidate for optimization. If pre-computed metrics are already present, the results are returned without any query-time computation.

The time filters in the aggregation query must be within the optimized scope of the rule, that is, between the time specified in the Compute aggregation for field and the time of the query.

For example, suppose the rule was created at 10 am on Jan 1, with a retention period of 2 weeks. After 2 days, on Jan 3, if you query for aggregated data which matches the rule but with a time filter condition of -2w, the result is still unoptimized because 2 weeks from the time of search results in time range starting from Dec 19, but the aggregation rule was not created on Dec 19. However, if you specify the Compute aggregation for value as -2w while creating the aggregation rule, then the aggregation is computed from Dec 19 onwards.

Aggregation rules also support grouping by time aggregates. The supported time aggregations are:

Time function Groups by
years(sys_eventTime, [multiplier]) Years
months(sys_eventTime, [multiplier]) Months
weeks(sys_eventTime, [multiplier]) Weeks
days(sys_eventTime, [multiplier]) Days
hours(sys_eventTime, [multiplier]) Hours
minutes(sys_eventTime, [multiplier]) Minutes
seconds(sys_eventTime, [multiplier]) Seconds

In these functions, the multiplier parameter is optional. For more information, see: Time functions.

For example:
USE BWProcesses | GROUP BY Domain, AppSpace,  Application, Process, days(sys_eventTime), hours(sys_eventTime), minutes(sys_eventTime, 10) COLUMNS avg(executionTime), max(failure), min(success)
This rule computes the metrics avg(executionTime), max(failure), min(success) for each combination of Domain, AppSpace, Application, and Process across every 10 minutes, every hour, and every day.

Such queries retrieve the time series data for trend analysis. For example, avg(executionTime), max(failure), min(success) for AppSpace=‘appSpace1’ AND Process='Process1' AND Domain='domain1' aggregated for each hour for the day='Monday'.

This query can be used in the Advanced Dashboard to create a time series chart showing the trend, for example, average execution time for a process across hours of the day.

Tip: After an aggregation rule is created, a filter Blok is automatically created in the system. If there are multiple time clauses in the GROUP BY query, multiple filter Bloks are created - one for each time clause. To save on query response time, you can use the filter Blok in Advanced Dashboards instead of typing the entire query.
An aggregation query can be equal to or subset of another rule query if all of the following conditions are met:
  • The USE or FROM clauses are equal
  • The GROUP BY non-time aggregates are exactly equal
  • The time aggregates are equal or subset
  • The projection aggregates of the two queries are equal or subset

Examples

The following examples illustrate how a search query is considered to match an existing aggregation rule.

Consider the aggregation rule as:

USE LogLogic_Monitor_Cpu | GROUP BY ll_nodeId, weeks(sys_eventTime), days(sys_eventTime), hours(sys_eventTime, 5)
COLUMNS max(ll_systemCPU), max(ll_processCPU), avg(ll_systemCPU), avg(ll_processCPU)
Now consider the following search queries:
Query1
USE LogLogic_Monitor_Cpu | GROUP BY ll_nodeId, weeks(sys_eventTime) COLUMNS max(ll_systemCPU),avg(ll_processCPU)
Query1 matches the aggregation rule because all conditions match:
  • The USE clause in Query1 is the same as that of the aggregation rule.
  • The GROUP BY non-time aggregate (ll_nodeId) in Query1 is exactly the same as that in the aggregation rule.
  • The time aggregate column weeks(sys_eventTime) is a subset of that in the aggregation rule. So, the GROUP BY clause of Query1 is a subset of the GROUP BY clause of the aggregation rule.
  • The projection aggregates in Query1, max(ll_systemCPU), avg(ll_processCPU), are a subset of that in the aggregation rule.
Query2
USE LogLogic_Monitor_Cpu, Loglogic_Appliance | GROUP BY ll_nodeId, weeks(sys_eventTime) COLUMNS max(ll_systemCPU), avg(ll_processCPU)

Query2 does not match the aggregation rule because:

The USE clause in Query2 is not same as that of the aggregation rule; it has an extra data model Loglogic_Appliance.

Query3
USE LogLogic_Monitor_Cpu | GROUP BY ll_nodeId, ll_pRuleID, weeks(sys_eventTime) COLUMNS max(ll_systemCPU), avg(ll_processCPU)

Query3 does not match the aggregation rule because:

The GROUP BY non-time aggregate in Query3 is not exactly the same as that in the aggregation rule; it has an extra column ll_pRuleID.

Query4
USE LogLogic_Monitor_Cpu | GROUP BY ll_nodeId, minutes(sys_eventTime) COLUMNS max(ll_systemCPU), avg(ll_processCPU)

Query4 does not match the aggregation rule because:

The time GROUP BY in Query4 is not equal to or a subset of that in the aggregation rule; it has a different column minutes(sys_eventTime).

Query5
USE LogLogic_Monitor_Cpu | GROUP BY ll_nodeId, minutes(sys_eventTime) COLUMNS max(ll_systemCPU), sum(ll_processCPU)

Query5 does not match the aggregation rule because:

The projection columns in Query5 are not equal to or a subset of that in the aggregation rule; it has a different column sum(ll_processCPU).

Limitations

You cannot create precomputed aggregation sets for infrastructure queries.