The GROUP BY Clause

A GROUP BY clause is used to arrange rows with identical values in columns into groups.

The GROUP BY clause immediately follows the WHERE clause of a SQL SELECT statement. For more information about the syntax of the SQL SELECT statement, see The Syntax of the SELECT Statement. The following code snippet is an example of the syntax of the GROUP BY clause:
GROUP BY <grouping term>[, <grouping term>] . . .
A grouping term is a reference to a value expression from the select list. The following query is invalid because state is not in the select list:
SELECT AVG(salary) FROM employees GROUP BY state
A GROUP BY clause is most often used with aggregation functions in the select list to provide summary information about each group of rows. For example, the following query returns a set of rows - one for each state, with each row containing two columns, the state and the mean salary for employees in that state:
SELECT state, AVG(salary) FROM employees GROUP BY state
A value expression, which is not an aggregate function, must normally be included as a grouping term in the GROUP BY clause. However, this is not a requirement.

Limitations

Currently, the GROUP BY clause can only be applied to value expressions that are column identifiers. Expressions and functions in the select list cannot be used as grouping terms. For example, the following query is invalid as it refers to the label of a function in the select list:
SELECT date(dtm) AS Year, COUNT(dtm) FROM mytable GROUP BY Year

Performance

When using a GROUP BY clause, the performance of the aggregation depends on how many rows are used in the aggregation. A query that includes aggregation but does not contain a WHERE clause performs aggregation using all the rows from the table and must be avoided, if possible.

For the best aggregation performance, the column used as an argument to your GROUP BY clause must have an index defined that uses that column. Your query must include a WHERE clause that causes the index to be selected for the query. For example, if you have a secondary index named lastname_idx defined for the lastname column, the following queries use lastname_idx to reduce the number of rows in the query on which aggregation is performed:
SELECT lastname FROM mytable WHERE lastname LIKE 'B%' GROUP BY lastname
For more information on query performance see Tips on Constructing an Efficient WHERE Clause.