======================================== Preprocessor Sample ======================================== This sample demonstrates the use of a preprocessor to manipulate data before it arrives at a table. A preprocessor applies to all data coming to a table, even if there are multiple data sources feeding into that table. A preprocessor differs from a data transform or an aggregation, as these actions create a data source by transforming the data in one table and publishing it in a different table. See the sample lv-sample-transform, which builds on this sample, for how a preprocessor applies to all data sources, including those coming from a transform. ================= SETUP ================= For this sample, imagine that you have a very limited version of a NYSE data feed. It has only these fields: - symbol: Stock symbol. In this sample, the only symbols are IBM and T - dateTime: The time of a trade, with precision to tenths of seconds. - price: The price of a trade. - shares: The number of shares - exchange: Always has value "NYSE" The preprocessor adds these fields: - change: Difference of price from this trade and from the last trade for the same stock - changeDir: An integer value of -1, 0, or 1, which represents the arithmetic sign of change ================== RUN THE SAMPLE ================== To run this sample in Studio, you can: - In the LiveView Project Viewer, click the green Run button in the upper right. - Right-click any of the lvconf table configuration files in the Project Explorer view and select Run As > Run Configurations > Run (in the invoked Run Configurations dialog). - Right-click the project folder itself, and select Run As > LiveView Fragment To run this sample from the command line outside of Studio, you must: - Package this sample's LiveView fragment project into a fragment archive. - Create a separate StreamBase Application project, and set the pom.xml for that project to depend on the fragment archive created in the previous step. - Create a separate StreamBase Application archive file. - Install that archive into a StreamBase Runtime node. - Start the node. These steps are described in more detail in the "Deploy with epadmin" page of the Concepts Overview in the StreamBase documentation. ================= PROJECT FILES ================= The table configuration files in this project are: Trades.lvconf: Table that represents the NYSE data. The lvconf file for this table has the following tags: fields: The fields that come from the NYSE data source as well as the additional fields that will be added by the preprocessor. primary-key: By declaring both symbol and date as primary keys, each new trade will be added to the table rather than replacing an existing record. (Even though the simulation data is only specific to tenths of seconds, it has been manipulated so that there are no timeDate duplicates for any one trade. Real data would go to thousandths of seconds and duplicates would be impossible.) preprocessor-chain: Here is where you declare the preprocessor that you will use, which is a StreamBase EventFlow application. You must declare the name, input, out output of the application. Also, because this preprocessor adds fields, you must turn off type checking, because the default assumption is that a preprocessor will accept the same fields as the table. data-sources: Refers to the data source, which is defined in the file NYSEDataSource.lvconf NYSEDataSource.lvconf: Declares the data source. In a real-world application, it would refer to a data feed. In this sample, it refers to the feed simulation, the parts of which are contained within the src/main/resources folder. DefaultTableSpace.lvconf - All tables refer to this table space, where project-wide options are configured. The remaining files in this project are: TradesPreprocessor.sbapp: The preprocessor EventFlow fragment. It has an in-memory query table that holds the last price for every symbol. First the sample uses the value in the table from the last trade for the current symbol to calculate the two fields, then sample saves the current value into the table to be used by the next trade that comes in for this symbol. The individual elements in the EventFlow are: INPUT QueryIn: This stream, with this exact schema, is required to appear in any preprocessor. DataIn: The name of the input must match that declared in the preprocessor element in the lvconf that is referencing the preprocessor (Trades.lvconf, in this case). Also, this input must declare the following fields in addition to the fields that are in the input feed which is being preprocessed. -- PublisherID, string -- PublisherSN, long -- CQSReferredCount, long -- CQSDataUpdatePredicate, string OUTPUT DataOut: The name of the output must match that declared in the preprocessor element in the lvconf that is referencing the preprocessor (Trades.lvconf, in this case). Also, this output must pass through the extra fields that are declared in the input. SPLIT ReadBeforeWrite: This manages the flow of the EventFlow application, ensuring that upper path, which wants to get the data from the LastQuote query table before the lower path of the application writes to it. QUERY TABLE LastQuote: An in-memory table which holds the last price for each symbol. QUERY GetLastQuote: Reads the last price for this symbol from the query table and calculates the two new fields. SaveCurrent: Writes the current value into the query table so it is available for the next trade for this symbol. Note that on the Operation tab, for "Type of write" = Insert, the value for "If a row already exists" was changed to "Update existing row using values below." This is because a new trade, is an insert operation to the EventFlow application, but we want it to be an update operation in the Query Table. engine.conf - a LiveView configuration file to set up JVM arguments for this engine instance. src/main/resources: Within this folder are the pieces that make up the simulation data feed for this sample. In a real LiveView application, the file NYSEDataSource.lvconf would reference an external data feed rather than the simulation. src/main/resources/NYSE.csv: A comma-separated file of data. src/main/resources/NYSE.sbfs: A StreamBase Feed Simulation. It declares the fields come from the data feed, references the CSV file with the actual data, and maps data columns to the fields. Note in the "Data rate" section, the simulator is set to feed the data timed by the DataTime column, so that the arrival of the data is timed in a realistic way. src/main/resources/NYSE.sbapp: This EventFlow application wraps the feed simulation and auto-start the feed when LiveView Server is ready.