GROUP BY Statement

Grouping can be used to group values by one or more expressions involving columns.

Grouping requires a list of grouping expressions and the list of aggregation columns.

A group by expression can be a column name or an expression involving multiple columns, and an optional list of aggregation functions after the COLUMNS keyword. All the group by expressions and the aggregates listed after the COLUMNS keyword are displayed by the query.

<groupByStatment> ::= "GROUP BY" <columnExpression> ( "," <columnExpression> )* )?
(COLUMNS <aggregationFunction> ("," <aggregationFunction> )* )?;

The following aggregation functions are supported:

  • COUNT(*): Count all the rows.
  • COUNT(columnName): Count all the rows in which the value of the column is not null.
  • COUNT(DISTINCT columnName): Count all distinct values from the column.
  • SUM(column): Sum of all values from the column. Supports numerical types (Integer, Long, Double).
  • AVG(column): Provide average value for the column. Supports numerical types (Integer, Long, Double).
  • MIN(column): Smallest value of the column. Supports all data types that can be ordered (Integer, Long, Double, Timestamp, String).
  • MAX(column): Largest value of the column. Supports all data types that can be ordered (Integer, Long, Double, Timestamp, String).
  • DURATION(timestamp): Returns the difference (in milliseconds) between the latest and the earliest time. Supports Timestamp only.
  • Time functions: Groups events by time. Supports time functions (Seconds, Minutes, Hours, Days, Weeks, Months, Years).

Examples

Grouping Expression Definition
group by ll_sourceUser columns count(*) The result has two columns, the ll_sourceUser and count of users per distinct value.
group by ll_sourceUser columns

count(ll_sourceUser),

min(sys_eventTime), max(sys_eventTime)

The result has 4 columns ll_sourceUser, number of users for each distinct value of source user, minimum value of sys_eventTime and maximum value of sys_eventTime.
group by ll_sourceUser columns

Duration(sys_eventTime)

The result has 2 columns, the source user and the duration.
group by ToLong(sys_e ventTime)/1000 COLUMN ToLong(sys_e ventTime)/1000,

AVG(LENGTH(sys_body)), COUNT(*)

The result has three columns, ToLong(sys_eventTime)/1000, AVG(LENGTH(sys_body)), and COUNT(*). Grouping is done using the value of the expression in the first column, which results in events being grouped by the second at which they occurred. The next column shows the average length of the events every second. The last column shows the number of events that occurred every second.