SQL Aggregate Functions

TIBCO ActiveSpaces supports the use of aggregate functions as output columns in queries.

The aggregate functions supported are listed in the following table:

Function	Description
AVG(<column>)	Computes the arithmetic mean of the non-NULL values in the column. The type of the result is always a Double, regardless of the data type of the underlying column. If the data type of the underlying column is String, the value is converted to a Double. If the data type of the underlying column is Opaque, the data is treated as a String and then converted to a Double.
COUNT(<column>) COUNT(*)	Computes the number of non-NULL values in the column, or the number of rows. The result is always of type Long. Note: COUNT() is not supported.
MIN(<column>)	The minimum non-NULL value in the column. The type of the result matches the type of the underlying column.
MAX(<column>)	The maximum non-NULL value in the column. The type of the result matches the type of the underlying column.
SUM(<column>)	Computes the sum of the non-NULL values in the column. The type of the result is Long if the data type of the underlying column is Long, otherwise it is Double. If the data type of the underlying column is String, the value is converted to a Double. If the data type of the underlying column is Opaque, the data is treated as a String and then converted to a Double.
GROUP_CONCAT( DISTINCT <column>)	Concatenates the unique values for a column in the group into a comma separated list. The DISTINCT keyword is required. The order of the values in the list is not defined. The type of the result is always a String, irrespective of the data type of the underlying column. If the data type of the underlying column is Long or Double, the value is converted to a String. If the data type of the underlying column is Opaque, the data is treated as String.

Aggregate functions can only be applied to a single column and cannot be applied to expressions or functions. For example, the following functions are not valid:

AVG(col1 + col2) 
MIN(trim(col1))

Whereas the following function is valid:

AVG(col1)

If all of the underlying values are NULL or cannot be converted to the appropriate data type for the function, the value of the function is NULL.

Using Aggregate Functions with the WHERE Clause

If the query includes a WHERE clause, the aggregate functions are applied to the rows that match the clause. If no WHERE clause is included, the functions are applied to all the rows in the table.

Using Aggregate Functions with the GROUP BY Clause

ActiveSpaces supports the GROUP BY clause and results in the aggregate functions being applied to the rows that match unique combinations of the GROUP BY columns. For example, the following query returns a set of rows - one for each state, with each row containing two columns, the state, and the mean salary for employees in that state:

SELECT state, AVG(salary) FROM employees GROUP BY state

If the query includes a GROUP BY clause, all the columns in the GROUP BY clause must be included as output columns. For instance, the following query is not valid because state is not one of the output columns:

SELECT AVG(salary) FROM employees GROUP BY state

Note: The GROUP BY clause can only be applied to a single column and cannot be applied to expressions or functions. The following code snippet is an invalid usage of the GROUP BY clause:

SELECT date(dtm) AS Year,COUNT(dtm) FROM t1 GROUP BY Year

What Happens when the Query Contains Both, the WHERE and the GROUP BY Clause

If the query contains both a WHERE clause and a GROUP BY clause, the WHERE clause must precede the GROUP BY clause.

As is the case with other TIBCO ActiveSpaces queries, the order of the rows in the result set is not defined unless the ORDER BY clause is used.

On large data sets, calculating the aggregate results can take a long time because large numbers of rows need to be processed before the computation can be completed. In such cases, when creating or executing the statement, it is advisable to set the TIBDG_STATEMENT_PROPERTY_DOUBLE_FETCH_TIMEOUT property accordingly.

The supported date-time functions can be used as output columns in the WHERE clause but not in the GROUP BY clause. Applications that send a large number of queries can choose a proxy binding strategy to distribute the queries across the proxies available to you.

The Usage of Indexes When the Query Contains Aggregation

Indexes can be used for a column that is an argument to a calculated function, or to GROUP BY, if the column has a constraint within the WHERE clause. For example, the following queries use the last_name_idx:

SELECT last_name, COUNT(last_name) AS count_last_name FROM t1 WHERE last_name LIKE 'B%' GROUP BY last_name

SELECT COUNT(last_name) AS count_last_name FROM t1 WHERE last_name LIKE 'B%
'SELECT last_name FROM t1 WHERE last_name LIKE 'C%' GROUP BY last_name

But the same queries without the WHERE constraints perform a full table scan.

Limitations

The following clauses are not supported:

HAVING
DISTINCT (Except for GROUP_CONCAT( DISTINCT <column>))
ALL

The following keywords cannot be passed as an argument to the aggregate functions:

DISTINCT (Except for GROUP_CONCAT( DISTINCT <column>))
ALL

For example, the following statements are invalid:

SELECT COUNT(DISTINCT city) FROM t1 
SELECT COUNT(ALL city) FROM t1

Floating Point Calculations in Aggregate Functions

By default, floating point calculations are inexact and the order in which they are carried out can subtly affect the results. When executing queries involving aggregate functions, ActiveSpaces processes the data from the different copysets in parallel leading to slight variations in the results of floating point calculations. While the variations might be very small, they can be amplified by rounding the decimal value. For example, the average of a column can be calculated as 80.2849999999999966 one time and 80.2850000000000108 another time. This variation is clearly very small (1.4e-14) but if the results are rounded to 2 decimal places, that is 80.28 and 80.29, the variation appears greater than that.

Related concepts

Date and Time Functions

SQL String Functions

Contents