SQL Aggregate Functions
TIBCO ActiveSpaces supports the use of aggregate functions as output columns in queries.
Function | Description |
---|---|
AVG(<column>) | Computes the arithmetic mean of the non-NULL values in the column. The type of the result is always a Double, regardless of the data type of the underlying column. If the data type of the underlying column is String, the value is converted to a Double. If the data type of the underlying column is Opaque, the data is treated as a String and then converted to a Double. |
COUNT(<column>) COUNT(*) | |
MIN(<column>) | The minimum non-NULL value in the column. The type of the result matches the type of the underlying column. |
MAX(<column>) | The maximum non-NULL value in the column. The type of the result matches the type of the underlying column. |
SUM(<column>) | Computes the sum of the non-NULL values in the column. The type of the result is Long if the data type of the underlying column is Long, otherwise it is Double. If the data type of the underlying column is String, the value is converted to a Double. If the data type of the underlying column is Opaque, the data is treated as a String and then converted to a Double. |
GROUP_CONCAT( DISTINCT <column>) | Concatenates the unique values for a column in the group into a comma separated list. The DISTINCT keyword is required. The order of the values in the list is not defined. The type of the result is always a String, irrespective of the data type of the underlying column. If the data type of the underlying column is Long or Double, the value is converted to a String. If the data type of the underlying column is Opaque, the data is treated as String. |
AVG(col1 + col2) MIN(trim(col1))Whereas the following function is valid:
AVG(col1)If all of the underlying values are NULL or cannot be converted to the appropriate data type for the function, the value of the function is NULL.
Using Aggregate Functions with the WHERE Clause
If the query includes a WHERE clause, the aggregate functions are applied to the rows that match the clause. If no WHERE clause is included, the functions are applied to all the rows in the table.
Using Aggregate Functions with the GROUP BY Clause
SELECT state, AVG(salary) FROM employees GROUP BY state
What Happens when the Query Contains Both, the WHERE and the GROUP BY Clause
If the query contains both a WHERE clause and a GROUP BY clause, the WHERE clause must precede the GROUP BY clause.
As is the case with other TIBCO ActiveSpaces queries, the order of the rows in the result set is not defined unless the ORDER BY clause is used.
On large data sets, calculating the aggregate results can take a long time because large numbers of rows need to be processed before the computation can be completed. In such cases, when creating or executing the statement, it is advisable to set the TIBDG_STATEMENT_PROPERTY_DOUBLE_FETCH_TIMEOUT property accordingly.
The supported date-time functions can be used as output columns in the WHERE clause but not in the GROUP BY clause. Applications that send a large number of queries can choose a proxy binding strategy to distribute the queries across the proxies available to you.
- The Usage of Indexes When the Query Contains Aggregation
- Indexes can be used for a column that is an argument to a calculated function, or to GROUP BY, if the column has a constraint within the WHERE clause. For example, the following queries use the last_name_idx:
SELECT last_name, COUNT(last_name) AS count_last_name FROM t1 WHERE last_name LIKE 'B%' GROUP BY last_name
SELECT COUNT(last_name) AS count_last_name FROM t1 WHERE last_name LIKE 'B% 'SELECT last_name FROM t1 WHERE last_name LIKE 'C%' GROUP BY last_name
But the same queries without the WHERE constraints perform a full table scan.
Limitations
Floating Point Calculations in Aggregate Functions
By default, floating point calculations are inexact and the order in which they are carried out can subtly affect the results. When executing queries involving aggregate functions, ActiveSpaces processes the data from the different copysets in parallel leading to slight variations in the results of floating point calculations. While the variations might be very small, they can be amplified by rounding the decimal value. For example, the average of a column can be calculated as 80.2849999999999966 one time and 80.2850000000000108 another time. This variation is clearly very small (1.4e-14) but if the results are rounded to 2 decimal places, that is 80.28 and 80.29, the variation appears greater than that.