StreamSQL Tutorial

Introduction

This topic describes how to build a StreamSQL application in StreamBase Studio by recreating the Market Feed Monitor application available in the StreamBase Studio SB Demos perspective. If you load this example from the SB Demos perspective, it creates a project named demo_Market Feed Monitor in the Package Explorer within the SB Authoring perspective. From the SB Test/Debug perspective, either the EventFlow or StreamSQL version of the application can be run.

Planning the Application

Before developing the StreamSQL application, let's review the data processing steps that must be performed. The EventFlow version of the application, shown in the following figure, provides a straightforward approach to viewing the entire application in a single glance. It's a good idea to run the provided Market Feed Monitor demo before working through the remainder of this tutorial.

Input to the application is provided through an input stream named TicksIn and the schema associated with the incoming tuples includes five fields:

  • Symbol, a string field of maximum length 25 characters that contains the symbol for a stock being traded;

  • SourceTimestamp, a timestamp field containing the time at which the tuple was generated by the source application;

  • BidPrice, a double field containing the price currently being offered by buyers of this stock;

  • AskPrice, a double field containing the price currently being sought by sellers of this stock; and

  • FeedName, a string field of maximum length 4 that contains the name of the stock feed service that submitted this tuple.

To duplicate this step in a StreamSQL application, use the CREATE INPUT STREAM statement to define an input stream and its schema.

Immediately after receiving each tuple, the EventFlow application uses a Map operator named LocalTime to add a timestamp field to the stream. This field contains the system time on the computer running the application. To duplicate this step in a StreamSQL application, use the CREATE STREAM statement to define a named stream with an additional field. For consistency with the EventFlow application, name this additional field LocalTime. The StreamBase function now() can be used to obtain the system time. A named stream is only accessible by other statements within this StreamSQL application and cannot, therefore, be accessed by network client applications such as applications that submit and retrieve tuples from Input and output streams.

Next, a field-based Aggregate operator, TicksPerSecond, executes two aggregate functions over a one second interval. The first function obtains the system time at the beginning of the one second interval, while the second function determines the number of tuples that passed through the operator during the one second interval. Separate calculations are performed for each stock feed service. The output stream from this operator includes three fields, but only one of these, FeedName, is derived from the input stream. The other two fields are derived from the aggregate functions executed by this operator. Since the other fields in the original input stream are not used by the application, it would have been slightly more efficient to use the preceding Map operator to drop these fields from the stream and not pass them to this Aggregate operator. To duplicate this step in a StreamSQL application, employ a CREATE WINDOW statement, a SELECT statement, and a CREATE STREAM statement. However, it is easy to combine these statements, which offers a simplification.

After calculating the number of tuples submitted during each one second interval, use another field-based Aggregate operator, Mean20s, to calculate some stream statistics over a twenty second interval. The output stream from this operator includes the FeedName field and four fields derived from aggregate functions executed by this operator. To duplicate this step in a StreamSQL application, you again need to employ a CREATE WINDOW statement, a SELECT statement, and a CREATE STREAM statement.

Finally, output tuples from the Mean20s Aggregate operator are sent to both the TickStats output stream, where they are available to external client applications, and to the Map and Filter operators, SetThreshold and TickFallOffFilter, which generate an alert that is available at the TickFallOffAlert output stream. Within a StreamSQL application, a CREATE OUTPUT STREAM statement duplicates the EventFlow application's TickStats output stream. There are several ways to replicate the functionality of the Map and Filter operators and the alternative output stream. Perhaps the most elegant approach is to use a stream valued expression (also called a subquery) and the arrow operator to pass results between subqueries.

Creating a Project

Start StreamBase Studio. See the Studio Reference Guide for instructions on using Studio and switching between perspectives.

In StreamBase Studio, go to the SB Demos perspective

  • Switch to the SB Demos perspective.

  • In the Select a demo drop-down list, select Financial - Market Feed Monitor.

  • Click Load Demo.

  • Instead of completing the demo, switch to the SB Authoring perspective for the next steps.

Add a StreamSQL file to the demo project folder.

  • Right-click the project folder demo_Financial - Market Feed Monitor

  • Select NewStreamSQL Application.

  • In the New StreamBase StreamSQL dialog, enter a unique file name, such as tutorial_StreamSQL

  • Click Finish.

StreamBase Studio creates an empty file named tutorial_StreamSQL.ssql in the demo project folder and opens it the StreamSQL editor.

When you open a StreamSQL file in StreamBase Studio, the canvas becomes an intelligent text editor into which you enter your StreamSQL statements. The editor provides syntax checking, pop-up help, and content assistance, which is a pop-up listing of entries that are valid at the current point in the file. As you enter content, syntax errors are described in the Typecheck Errors view. With each statement, the syntax checker flags the statement until it is complete, so use these warnings as a guide to completing the statement. You can review the full syntax for each statement in the StreamSQL Guide.

Developing the StreamSQL Application

Now enter the content into the StreamSQL file you just created, as shown in the following sections. A completed version of the StreamSQL file is included below.

Create the TicksIn Input Stream

In a single StreamSQL statement, declare the input stream and its associated tuple schema. You must define a schema that corresponds to the content of the tuples that will be submitted to this stream. The tuple's structure is defined by the developer of the client application that interacts with a StreamBase application.

CREATE INPUT STREAM TicksIn (
    Symbol string,
    SourceTimestamp timestamp,
    BidPrice double,
    AskPrice double,
    FeedName string
);

Create the TicksWithTime Stream

Create an intermediate stream that adds a field to contain the local time. This parallels the Map operator in the EventFlow version of this application.

CREATE STREAM TicksWithTime AS
    SELECT *, now() AS LocalTime FROM TicksIn;

Create the TicksPerSecond Stream

Create an aggregate stream that parallels the Aggregate operator in the EvenfFlow version as a one-second window over the LocalTime field.

CREATE STREAM TicksPerSecond AS
    SELECT openval() AS StartOfTimeSlice, count() AS
    NumberTicks,FeedName
    FROM TicksWithTime [SIZE 1 ON LocalTime PARTITION BY FeedName]
    GROUP BY FeedName;

Create the TickStats Output Stream

This block combines creating an output stream and using an aggregate into one statement. Like the Mean20s Aggregate operator in the EventFlow version, this block makes a 20-second overlapping window on the StartOfTimeSlice grouped by FeedName.

CREATE OUTPUT STREAM TickStats AS
    SELECT openval() AS StartOfTimeSlice,
    avg(NumberTicks) AS AvgTicksPerSecond,
    stdev(NumberTicks) AS StdevTicksPerSecond,
    lastval(NumberTicks) AS LastTicksPerSecond,
    FeedName
    FROM TicksPerSecond [SIZE 20 ADVANCE 1 ON StartOfTimeSlice
    PARTITION BY FeedName]
    GROUP BY FeedName;

Create the SetThreshold Stream

This statement adds the AlertThreshold field to the stream. This is parallel to adding a field using a Map operator in EventFlow.

CREATE STREAM SetThreshold AS
    SELECT *,.75 AS AlertThreshold FROM TickStats;

Create the TickFallOffAlert Output Stream

Finally, combine a filter with creating an output stream, using a WHERE clause to find ticks that are less than a certain threshold.

CREATE OUTPUT STREAM TickFallOffAlert AS
    SELECT * FROM SetThreshold 
    WHERE LastTicksPerSecond < AvgTicksPerSecond * AlertThreshold;

The Completed StreamSQL File

The following combines the steps above into a single StreamSQL application.

CREATE INPUT STREAM TicksIn (
    Symbol string,
    SourceTimestamp timestamp,
    BidPrice double,
    AskPrice double,
    FeedName string
);
CREATE STREAM TicksWithTime AS
    SELECT *, now() AS LocalTime FROM TicksIn;
CREATE STREAM TicksPerSecond AS
    SELECT openval() AS StartOfTimeSlice, count() AS
    NumberTicks,FeedName
    FROM TicksWithTime [SIZE 1 ON LocalTime PARTITION BY FeedName]
    GROUP BY FeedName;
CREATE OUTPUT STREAM TickStats AS
    SELECT openval() AS StartOfTimeSlice,
    avg(NumberTicks) AS AvgTicksPerSecond,
    stdev(NumberTicks) AS StdevTicksPerSecond,
    lastval(NumberTicks) AS LastTicksPerSecond,
    FeedName
    FROM TicksPerSecond [SIZE 20 ADVANCE 1 ON StartOfTimeSlice
    PARTITION BY FeedName]
    GROUP BY FeedName;
CREATE STREAM SetThreshold AS
    SELECT *,.75 AS AlertThreshold FROM TickStats;
CREATE OUTPUT STREAM TickFallOffAlert AS
    SELECT * FROM SetThreshold 
    WHERE LastTicksPerSecond < AvgTicksPerSecond * AlertThreshold;

Running the StreamSQL Application

Running the application depends on the ability to rapidly submit a large number of tuples to the input stream. This makes it impractical to test this application using Studio's Manual Input view, so we use a feed simulation and a data file. We will use StreamBase Studio to generate the feed simulation file, and we will borrow an existing CSV test data file from the Market Feed Monitor demo.

Inspecting the Feed Simulation File (optional)

The Financial - Market Feed Monitor project contains a feed simulation, MarketData.sbsf, which reads the data file marketfeed.csv. You can use the feed simulation to run your .ssql file. If you want to see how the feed simulation uses the data file, you can:

  • Double-click MarketData.sbfs in the Package Explorer to open it in the Feed Simulation editor.

  • Notice the TicksIn schema in the Simulation Streams section. Click the triangle widget next to it to see the fields that the feed simulation generates on that stream.

  • Scroll down to the Generation Method for TicksIn section. You will notice that the selected method is Data File

  • Click Options underneath Data File to open the Data File Options dialog.

  • At the top, the Data File is set to marketfeed.csv in the project folder. Notice the records, shown both in the preview and formatted in the pane below that. The first record is:

    IBM,2006-04-31 10:18:23.0347,2006-04-31 10:18:23.0247,81.37,2006-04-31 10:18:23.0197,100,100,1,0,NYSE
    

    Fields 1 (stock symbol), 3 (timestamp), 4 (bid or ask price) and 10 (feed name) correspond to the fields required by the input stream. Since each data file record includes unneeded fields, the feed simulation must select only the required fields. The fields used are indicated in the Column mapping pane.

  • When you are finished inspecting the feed simulation file, click Cancel.

Running the Application

In the SB Authoring perspective, make sure the tutorial_StreamSQL.ssql editor session is selected and active, then click the Run button. This opens the SB Test/Debug perspective and starts the application.

In the Feed Simulations view, highlight the feed simulation file and click Run.

Monitor the input and output tuples in the Application Input and Application Output views. Note that tuples appear on both output streams.

When done, press F9 or click the Stop Running Application button.