Aggregation Rules
You can define aggregation rules to optimize the performance of aggregation (GROUP BY) queries.
After you create and enable aggregation rules, the system precomputes the aggregations as and when log events arrive in the system. The aggregations are used to compute search query results. As time progresses, the aggregations are precomputed at real time, providing query results much faster than the queries that were not optimized. Such optimized aggregation queries can be vital for creating responsive dashboards.
An aggregation rule is defined as a regular EQL or SQL query that contains a GROUP BY statement and aggregated projections. For details, see GROUP BY Statement.
For example, consider the data model for BW process information. This data model has columns such as Domain, AppSpace, Application, Process, Activity, exectionTime, success, and failure.
USE BWProcesses | GROUP BY Domain, AppSpace, Application, Process COLUMNS avg(executionTime), max(failure), min(success)For this search query, the aggregation rule maintains the aggregate metrics avg(executionTime), max(failure), min(success) for each combination of the GROUP BY fields: Domain, AppSpace, Application, and Process. The computation happens on every relevant log event ingested in the system at real time.
When you use an aggregation query for which there is a matching aggregation rule, the query is a candidate for optimization. If pre-computed metrics are already present, the results are returned without any query-time computation.
The time filters in the aggregation query must be within the optimized scope of the rule, that is, between the time specified in the Compute aggregation for field and the time of the query.
For example, suppose the rule was created at 10 am on Jan 1, with a retention period of 2 weeks. After 2 days, on Jan 3, if you query for aggregated data which matches the rule but with a time filter condition of -2w, the result is still unoptimized because 2 weeks from the time of search results in time range starting from Dec 19, but the aggregation rule was not created on Dec 19. However, if you specify the Compute aggregation for value as -2w while creating the aggregation rule, then the aggregation is computed from Dec 19 onwards.
Aggregation rules also support grouping by time aggregates. The supported time aggregations are:
Time function | Groups by |
---|---|
years(sys_eventTime, [multiplier]) | Years |
months(sys_eventTime, [multiplier]) | Months |
weeks(sys_eventTime, [multiplier]) | Weeks |
days(sys_eventTime, [multiplier]) | Days |
hours(sys_eventTime, [multiplier]) | Hours |
minutes(sys_eventTime, [multiplier]) | Minutes |
seconds(sys_eventTime, [multiplier]) | Seconds |
In these functions, the multiplier parameter is optional. For more information, see: Time functions.
USE BWProcesses | GROUP BY Domain, AppSpace, Application, Process, days(sys_eventTime), hours(sys_eventTime), minutes(sys_eventTime, 10) COLUMNS avg(executionTime), max(failure), min(success)This rule computes the metrics avg(executionTime), max(failure), min(success) for each combination of Domain, AppSpace, Application, and Process across every 10 minutes, every hour, and every day.
Such queries retrieve the time series data for trend analysis. For example, avg(executionTime), max(failure), min(success) for AppSpace=‘appSpace1’ AND Process='Process1' AND Domain='domain1' aggregated for each hour for the day='Monday'.
This query can be used in the Advanced Dashboard to create a time series chart showing the trend, for example, average execution time for a process across hours of the day.
Examples
The following examples illustrate how a search query is considered to match an existing aggregation rule.
Consider the aggregation rule as:
USE LogLogic_Monitor_Cpu | GROUP BY ll_nodeId, weeks(sys_eventTime), days(sys_eventTime), hours(sys_eventTime, 5) COLUMNS max(ll_systemCPU), max(ll_processCPU), avg(ll_systemCPU), avg(ll_processCPU)Now consider the following search queries:
- Query1
-
USE LogLogic_Monitor_Cpu | GROUP BY ll_nodeId, weeks(sys_eventTime) COLUMNS max(ll_systemCPU),avg(ll_processCPU)
Query1 matches the aggregation rule because all conditions match:- The USE clause in Query1 is the same as that of the aggregation rule.
- The GROUP BY non-time aggregate (ll_nodeId) in Query1 is exactly the same as that in the aggregation rule.
- The time aggregate column weeks(sys_eventTime) is a subset of that in the aggregation rule. So, the GROUP BY clause of Query1 is a subset of the GROUP BY clause of the aggregation rule.
- The projection aggregates in Query1, max(ll_systemCPU), avg(ll_processCPU), are a subset of that in the aggregation rule.
- Query2
-
USE LogLogic_Monitor_Cpu, Loglogic_Appliance | GROUP BY ll_nodeId, weeks(sys_eventTime) COLUMNS max(ll_systemCPU), avg(ll_processCPU)
Query2 does not match the aggregation rule because:
The USE clause in Query2 is not same as that of the aggregation rule; it has an extra data model Loglogic_Appliance.
- Query3
-
USE LogLogic_Monitor_Cpu | GROUP BY ll_nodeId, ll_pRuleID, weeks(sys_eventTime) COLUMNS max(ll_systemCPU), avg(ll_processCPU)
Query3 does not match the aggregation rule because:
The GROUP BY non-time aggregate in Query3 is not exactly the same as that in the aggregation rule; it has an extra column ll_pRuleID.
- Query4
-
USE LogLogic_Monitor_Cpu | GROUP BY ll_nodeId, minutes(sys_eventTime) COLUMNS max(ll_systemCPU), avg(ll_processCPU)
Query4 does not match the aggregation rule because:
The time GROUP BY in Query4 is not equal to or a subset of that in the aggregation rule; it has a different column minutes(sys_eventTime).
- Query5
-
USE LogLogic_Monitor_Cpu | GROUP BY ll_nodeId, minutes(sys_eventTime) COLUMNS max(ll_systemCPU), sum(ll_processCPU)
Query5 does not match the aggregation rule because:
The projection columns in Query5 are not equal to or a subset of that in the aggregation rule; it has a different column sum(ll_processCPU).