Aggregate Operator: Field-Based Dimension Options

To specify a field-based dimension, select Field from the Type drop-down list in the Edit Dimension dialog. The drop-down list labeled Field then shows all fields in the incoming tuple of type int, long, double, or timestamp (interpreted as an interval timestamp in seconds).

With a field-based dimension, a new window is established and evaluated based on the value of a numeric field in the incoming tuple. A tuple containing the results of the aggregation is emitted and typically the window is closed when a tuple arrives whose field value exceeds the specified range of the open window. The arriving tuple that triggered the close is placed in the next window.

A window also closes and emits when the period specified in the Close and emit after expression elapses, even if no tuple arrives. (Note that prior to StreamBase release 7.4, an input tuple was required to close any time-based window in a Field or Time dimension.)

The field you select is typically a timestamp field, or a numeric index or count that behaves like a timestamp. The values of any such field are assumed to increase with each new tuple, but not necessarily in a regular fashion. A runtime OutOfOrder exception occurs if Input values for the selected numeric field do not increase monotonically. The window for a dimension based on a timestamp field can be set to emit and close based on elapsed time. For more information and a caveat about using time-based aggregation windows, see Aggregate Operator: Time-Based Dimension Options.

The Edit Dimension dialog for a field-based dimension has the following appearance.

In this example, even though the field dimension identified here as AggregateTradesDim represents time and its values are assumed to increase monotonically, it is not of data type timestamp, but is relative to some unspecified base time.

Using the group-as option, the Aggregate Operator Field Dimension Sample aggregates trading volume on a per-stock-symbol basis over time. Its input stream has the schema {Time: int, Symbol: string, Volume: int}. Its output stream has the schema {Symbol: string, TimeChunk: int, TotalVolume: int}. This Aggregate operator field dimension opens a window when the first tuple arrives and receives tuples for 30 seconds, until the next arriving tuple triggers the window to emit and close. The dimension then opens a new window immediately. The following table illustrates this behavior, aggregating tuples for the stock symbols AMAT and INTC.

The following table illustrates the behavior of this field-based dimension with windows of Size: 30, Advance: 30 and Offset: 0 based on field Time and grouping by field Symbol as a sequence of input events arrive.

Tuple Field Values Open? Emit? Close?
1 10, AMAT, 100 Yes (window 1) - -
2 20, AMAT, 200 - - -
3 40, AMAT, 100 Yes (Window 2) Yes (Window 1): Symbol AMAT TimeChunk 0 TotalVolume 300 Yes (Window 1)
4 41, AMAT, 100 - - -
5 45, INTC, 100 Yes (Window 3) - -
6 50, AMAT, 200 - - -
7 55, INTC, 300 - - -
8 65, AMAT, 100 -Yes (Window 4) Yes (windows 2 and 3): Symbol AMAT TimeChunk 30 TotalVolume 400 Symbol INTC TimeChunk 30 TotalVolume 400 Yes (windows 2 and 3)

To summarize this flow of events:

  • Tuple 1 arrives. No windows exist, so one is created to receive it.

  • Tuple 2 comes 10 seconds later and is added to window 1.

  • When tuple 3 arrives, its Time value is 40, which is greater than the window size (30). The first window emits the tuple {Symbol=AMAT, TimeChunk=0, TotalVolume=305} and closes. A new window opens to hold tuple 3.

  • Tuple 4 arrives with values {41, AMAT, 100} and enters window 2.

  • Tuple 5 arrives. Its Symbol value (INTC) is new, which causes window 3 to open to hold a new group.

  • Tuples 6 and 7 arrive, and enter their respective group windows. As Time has not advanced by more than 30 seconds for either open window, no calculations or emissions occur.

  • Tuple 8 arrives with value {65, AMAT, 100}. As the time value is now greater than 60, windows 2 and 3 both calculate and emit values, and then close. Window 4 opens to receive tuple 8.

The following table describes the options available in the Edit Dimension dialog for field-based dimensions.

Category Options and Meaning
Field The drop-down list for this field shows all fields in the incoming schema that have the StreamBase data type of int, double, long, or timestamp (interpreted as an interval timestamp in seconds). Base the dimension only on input fields you know will contain monotonically increasing values.
Opening policy: Select one of these options:
  • Do not open window based on this dimension: If selected, this dimension cannot cause the opening of a new window for the Aggregate. Tuples will enter whatever window is currently open.

    Note that a window will in fact open when the first tuple arrives if no other dimension has opened a window, as there must always be an open window to receive tuples.

  • Open per:

    • Advance: A simple numeric expression describing the amount by which to advance the window. Think of this setting as a slider for multiple windows.

      Note

      When you set an advance, the results can be sensitive to how data actually received at runtime increments. For example, consider a dimension where the window size and advance are both 1, and at runtime a sequence of tuples arrives with the values 0 and 9.2, respectively. The first window opens at the value 0, as expected. However, the second window opens at 9.0, even though your advance is set to 1. This is because the Aggregate operator automatically advances the start of your next window to the next expression value received. The values between 0 and 9 represent empty windows, and are ignored by the Aggregate operator.

      If you do not specify a Window Size (Close and emit value), you cannot specify a value for Advance, because the window will not close.

    • Offset: A simple numeric expression describing the value by which to increment the start of windows. Windows start at the Offset value plus integer multiples of the Advance value (or minus it, should it be negative). By default, the value of Offset is 0.

      For example, consider a field-based dimension based on a timestamp field, with a window size of 30 (seconds) and an Advance of 30. With no offset, we might see a series of windows at the following start intervals (based on the timestamps of the input tuples):

      0, 30, 60, 90...
      

      With an offset of 3, the same windows would have these starting timestamps:

      3, 33, 63, 93...
      

      Window openings are further influenced by whether the Open windows before first tuple option is set, as described below.

    Note that the Aggregate operator opens a window when the first input tuple arrives if no dimension has already opened one.

    The units for the values of Advance and Offset depend on the data type of the selected field. For example, if the selected field has the timestamp data type, an Advance value of 30 advances the window by 30 seconds.

    You cannot specify a value for Advance that is greater than the value of Window size.

Window size: Select one of these options:
  • Do not close window based on this dimension — Default: When selected, this dimension cannot cause the closing of a new window for the Aggregate. If a new window is never opened, and Do not close... is selected, this creates an unlimited size window that never closes for the life of the application.

  • Close and emit after [number]: When selected, sets the size of the window. number can be a simple expression that resolves to an integer value. The window is closed and an aggregation results tuple is emitted when a tuple is received with a Field value exceeding the number specified. The tuple whose value causes the window to close is not included in its results, but may be added to other windows based on currently operative rules.

    When the Window size is the same as the Advance, only one window is open at a time for each group. When the Advance is less than the Window size, windows overlap each other. For example, with a window size of 30 seconds and an Advance of 15, windows would be created every 15 seconds and stay open for 30 seconds.

Emission policy: Select one of these options:
  • No intermediate emissions based on this dimension: When selected, this dimension does not force an immediate emission of a results tuple.

  • Intermediate emission every [number]: When selected, allows tuples to be emitted before the window closes. number can be a simple expression that resolves to an integer value. For example, if the selected field has the timestamp data type and the Window size is set to 8, you could specify an intermediate emission every 4 seconds, instead of waiting for the Window size limit to be met.

Optional windows: Select one of these options:
  • Open only a single window for the first event or following a gap in values — When selected (the default setting), the first window is not created until the specified Window size is reached, and the operator emits tuples only when windows meet the Window Size and Emission Policy criteria described above.

  • Open all windows for the first event or following a gap in valuesSelecting this option only has an effect when the Advance value is less than the Window size value. In this case, the results depend on whether you specify any group-by fields in the Group Options tab:

    • With no grouping specified, this option's effect is to open the first window immediately and begin to accumulate any tuples that arrive before the Window size is reached. One or more extra tuples are emitted at the beginning, and after any gap in the accumulating values.

    • With groups specified, this option's effect is to open the first window immediately as above, and to emit extra tuples for partial, intermediate windows.


By default, at most one new window is created when the first tuple arrives or the difference between the current tuple and the previous tuple values (for the field on which the dimension is based) is greater than or equal to the specified Window size. The openval() for that window and subsequent windows is an integer multiple of Advance, plus Offset, and is equal to or less than the current tuple's value.

However, more than one window can be created if Open all windows for the first event or following a gap in values is selected under Optional windows, and the value you specified for Advance is smaller than that for Window size.

Example: The Effect of Optional Windows

Suppose you have an application that calculates one-week moving averages of daily temperature records for localities. The input stream schema is: {Day int, City string, Low double, Average double, High double}. The application uses a field-based Aggregate operator to compute averages of the three temperature measurements and outputs the base day and the number of days represented by each computed average.

The Aggregate Functions tab view looks like this:

To produce weekly moving averages, the field dimension Opening policy should advance one day at a time, based on integer field Day, and have a Window size of 7 days, with no intermediate emissions. The Edit Dimension dialog looks like this:

This creates up to seven overlapping windows. Whether results are emitted for partially full windows at the start is controlled by the Optional windows setting:

  • When Open only a single window for the first event or following a gap in values is selected, a window opens with the first tuple. Its openval is the largest Advance + Offset that is less than or equal to that first value. The operator emits tuples when windows contain seven tuples. That is, the first emission occurs when the eighth input tuple is received. The StartDay output field, which is the dimension's openval, begins with the first value of the Day field. For a certain input stream, the output starts off as follows:

    StartDay=1 NumDays=7 LowAvg=64.4 AverageAvg=70.1 HighAvg=78.0 
    StartDay=2 NumDays=7 LowAvg=64.7 AverageAvg=70.7 HighAvg=78.3 
    StartDay=3 NumDays=7 LowAvg=65.4 AverageAvg=72.4 HighAvg=79.9 
    ...
    
  • If Optional windows is changed to Open all windows for the first event or following a gap in values, a new window opens for each input tuple because Advance is set to 1. Emissions begin when the second tuple is received. The first emission averages one value, the second two values, and so on, until all windows contain seven tuples. The first StartDay output field, which indicates the dimension's openval, has a value of -5. At the seventh iteration, all windows are full and the output is the same as the single window case above. For the same input stream as above, the output starts out as follows:

    StartDay=-5 NumDays=1 LowAvg=62.0 AverageAvg=68.0 HighAvg=78.0 
    StartDay=-4 NumDays=2 LowAvg=61.5 AverageAvg=67.0 HighAvg=75.5 
    StartDay=-3 NumDays=3 LowAvg=61.7 AverageAvg=67.0 HighAvg=75.0 
    StartDay=-2 NumDays=4 LowAvg=62.8 AverageAvg=68.8 HighAvg=76.5 
    StartDay=-1 NumDays=5 LowAvg=64.6 AverageAvg=70.6 HighAvg=79.0 
    StartDay=0 NumDays=6 LowAvg=64.8 AverageAvg=70.5 HighAvg=78.8 
    StartDay=1 NumDays=7 LowAvg=64.4 AverageAvg=70.1 HighAvg=78.0 
    StartDay=2 NumDays=7 LowAvg=64.7 AverageAvg=70.7 HighAvg=78.3 
    StartDay=3 NumDays=7 LowAvg=65.4 AverageAvg=72.4 HighAvg=79.9 
    ...
    

In both cases, a new window opens for every tuple received, and events after the first tuple enter multiple windows. In both cases, windows do not emit until the operator has received seven tuples. In the second case, however, not all open windows receive new tuples; the oldest window stops receiving tuples whenever a new window opens, so the first six windows have less the seven events to average.

Run the Sample

To learn more about field-based aggregation works, run the Aggregate Operator Field Dimension Sample. It contains an Aggregate operator that sums the volume of trades of particular stocks over 30 second windows, advancing every 30 seconds, as described above. Extend the sample by computing the average volume per symbol in each group window and adding it to the output. Click the green Plus Sign on the Aggregate Functions tab and add an output field named AverageVolume, produced by the expression: avg(Volume). You can also, in the Edit Dimension dialog, change the Advance value from 30 to 20. Setting Advance to be less than Window Size creates sliding windows with overlapping contents.