Using aggregations in OVER expressions
When the results of an expression should be an aggregated view of
the values in a column, you simply use the expression function of your choice
to determine the output. For example, when a column containing sales figures
should be summarized, you enter
Sum([Sales])
in the expression field. However, aggregated
values combined with OVER expressions allow you to split the data the way you
want it.
Nested aggregations
There are a few differences in how nested aggregations behave when used in calculated columns and in custom expressions. See below for some examples. (Copy the text below and paste it (installed client only) or save it as a text file to use as a data table in an analysis and try it out yourself.)
Region Product Sales
EAST Phone 42
EAST Phone 118
EAST Chair 190
WEST Desk 19
WEST Chair 20
CENTRAL Phone 18
CENTRAL Desk 17
CENTRAL Desk 12
CENTRAL Desk 13
CENTRAL Chair 20
In calculated columns: Max of total Sales per Region
By writing function calls within functions you can get complete control over the output of your expression. More than one aggregation can be nested, and each nested aggregation can have its own OVER expression. When the data table is filtered, the calculated column still shows values based on the whole data table.
About this task
See also Using OVER statements in calculated columns.
Below is an example of how you can use the Add calculated column tool to compute the total sales amount per region, and then find the max of these sales.
Procedure
Results
Calculation details
This is what happens within the data engine when a calculated column is added; note that the intermediate tables shown in these examples are never presented to end users.
Original data:
First the inner aggregation will be computed, and the sum of sales per region will be:
In a calculated column, the result from a calculation is always applied to all rows of the input table, hence, the resulting sales value for each region will be smeared over all the rows:
Max()
will then be computed on "Total Sales
(SMEARED)", resulting in the value 350, and because we are inserting a
calculated column, that value will also be smeared over all rows. This means
that the final result will be:
When the data table is filtered, the calculated column still shows values based on the whole data table.
In custom expressions: Sales for the best selling Product per Region and filter out one product group
Nested aggregations can also be used in custom expressions on axes in visualizations. The behavior is similar to the case with calculated columns, but there are some differences. First of all, filtering is taken into account, so all aggregations will work on the filtered subset of the data. Also, only the outermost aggregation will use the grouping defined by the visualization, while all the inner aggregations need OVER expressions to split the data. All inner aggregations will be treated the same way as calculated columns, meaning that resulting values of the inner aggregation will be added to all rows of the outer aggregation. When there is an OVER expression in an inner aggregation, all the categorical axes in the visualization will automatically be intersected with (not for unaggregated visualizations), even though this is not explicitly stated in the expression. See Calculation details below for more information.
About this task
In this example, we want to show one bar per region in a visualization, where the height of the bar is the sales for the best selling product in that region. We also want to filter out the product "Chair", assuming that it is the top product besides chairs that is of interest.
Procedure
Results
Calculation details
This is what happens within the data engine when a calculated column is added; note that the intermediate tables are never presented to end users.
Starting with the original data, you have the following table:
When "Chair" is filtered out, the data on which the calculations are based in the visualization is:
The inner aggregation will be intersected with the categorical axis,
hence, the actual expression computed will be:
Sum([Sales]) OVER Intersect([Region],[Product])
,
which will result in the following sales per (region,product)-pair:
Just as for calculated columns, inner aggregations will always be
smeared to all rows before the result is passed to the outer aggregation. This
means that the following will be the input to the
Max()
function:
Since this example is calculated as a custom expression in a
visualization, the outermost aggregation (Max()
) will use the
grouping defined by the visualization. This means that the actual expression
calculated will be based on the filtered, smeared rows, which will result in
the following values for the bars:
Because the "Chair" category was filtered out, the top remaining category in the east region was "Phone" and the top category in the west region was "Desk". If the filtering is changed, the resulting top category changes as well.
UniqueConcatenate(If(Max(Sum([Sales]) OVER (Intersect([Product]))) OVER (Intersect())=Sum([Sales]) OVER (Intersect([Product])),[Product],null))