Query TopN Sample

About This Sample

Suppose you want to extract a range of values from a query table, such as the ten highest or lowest stocks or best- or worst-selling items in an inventory. One method is to use a Query operator to repeatedly read all the rows in the table, and then use a series of other operators to process the result, compare prices across stock symbols or products to find the top n. Conversely, you could pre-process tuples upstream from a query table that has just one row containing n fields. Each time a tuple arrives on the input stream, compare values to see if the table needs to be updated.

Both of these options require complicated processing to compare the values and continually calculate the top n values. The QueryTopN sample application demonstrates a simpler method, using the Query operator's built-in option to limit the number of output rows and b-tree indexing.

The sample has one input stream, in which you enter the value of n as the int field howMany. The table is populated from a CSV data file containing randomly generated values, but in a real application data from an input stream or another module would be updating the table dynamically.

The table is indexed by field value in descending order, so that the first n values of the index are always the largest ones. Sending a tuple from the enterN input stream triggers a read operation that outputs just the current top n highest values by setting the Limit field in the Query operator to the input value howMany.

The tuples output from the table are split into two streams and processed by an Aggregate operator and a Map operator, respectively. The aggregate operator uses aggregatelist(tuple(...)) in a predicate dimension to generate a list of the top n tuples. The dimension just has a Close expression, count()=howMany, to do this. The Map operator restores the original field names and drops input field howMany to output n individual tuples on the lower stream.

Importing This Sample into StreamBase Studio

This sample is part of the operator samples. In StreamBase Studio, import the operator samples with the following steps:

  • From the top-level menu, click File>Import Samples and Community Content.

  • Enter sample group to narrow the list of options.

  • Select Operator sample group from the Data Constructs and Operators category.

  • Click Import Now.

StreamBase Studio creates a single project containing all the operator samples.

Running This Sample in StreamBase Studio

  1. In the Project Explorer view, open the sample you just loaded.

    If you see red marks on a project folder, wait a moment for the project to load its features.

    If the red marks do not resolve themselves after a minute, select the project, right-click, and select Maven>Update Project from the context menu.

  2. Open the src/main/eventflow/com.tibco.sb.sample.operator folder.

  3. Open the QueryTopN.sbapp file and click the Run button. This opens the SB Test/Debug perspective and starts the module.

  4. In the Output Streams view, make sure that All Output Streams is selected in the Output stream control.

  5. Enter 1 for howMany (the number of values you want to output) in the Manual Input view and click Send Data.

  6. Observe the output streams in the Output Streams view. Note that:

    1. The topNtuples stream contains one tuple having fields value and symbol. It is the highest value in the table.

    2. The topNlist stream contains one tuple, a list containing the above tuple.

  7. Repeat steps 4 and 5, increasing howMany to 2, 3, ..., to see the set of top n values grow.

  8. When done, press F9 or click the Terminate EventFlow Fragment button.

Sample Location

When you load the sample into StreamBase® Studio, Studio copies the sample project's files to your Studio workspace, which is normally part of your home directory, with full access rights.

Important

Load this sample in StreamBase® Studio, and thereafter use the Studio workspace copy of the sample to run and test it, even when running from the command prompt.

Using the workspace copy of the sample avoids permission problems. The default workspace location for this sample is:

studio-workspace/sample_operator

See Default Installation Directories for the default location of studio-workspace on your system.