XML Normalizer Operator Sample

This sample shows how to use the TIBCO StreamBase® XML Normalizer operator.

Introduction

The XML Normalizer operator is a global Java operator that parses a designated field containing a string in XML format, and emits one tuple for each top-level element extracted from the XML string field. Each emitted tuple contains a user-defined set of string fields parsed from the input XML string, plus an optional field that reports any XML parsing errors. All fields in the input tuple other than the XML string field are optionally passed through unchanged to each emitted tuple, except input fields of type tuple or list, which are not supported and are emitted as null.

See the description of the operator's Properties view in Using the XML Normalizer Operator.

The XMLSimple.sbapp sample EventFlow module illustrates the following aspects of using the XML Normalizer operator:

  • The operator emits all extracted XML fields as strings, including XML fields that hold numeric data. The sample module includes a Map operator, ConvertToNumbers, that converts two extracted numeric fields to StreamBase int and double data types.

  • The operator emits all extracted XML fields as a set of string fields. The sample module includes a second Map operator, ConvertToTuple, that converts the extracted XML fields to a single tuple field for further processing downstream.

This Sample's Files

The XML Normalizer sample includes the following files:

File Purpose
XMLSimple.sbapp Sample EventFlow module to illustrate using the XML Normalizer operator.
XMLSimple.sblayout The layout file associated with the sample EventFlow module.
TradeHist.xml An example of trade data formatted as standard, indented XML.
TradeHist.csv A CSV file with three fields, the first of which is the XML content in TradeHist.xml, but flattened to a single string with all line endings removed. This CSV file has one deliberate error at the end of the XML string field, which demonstrates how the operator handles XML parsing errors. This file serves as input for the TradeHist.sbfs feed simulation.
TradeHist-unflattened.csv Same contents as TradeHist.csv, but with the first field shown as standard indented XML with embedded line ending characters. You can experiment with loading this version of the CSV file as input for the feed simulation, using a custom file reader you write that removes the line ending characters from the XML string field.
TradeHist.sbfs Feed simulation file that loads TradeHist.csv as its input file.

Running the XML Normalizer Sample in StreamBase Studio

  1. In the Package Explorer view, in project sample_xml-normalizer, double-click to open the XMLSimple.sbapp application.

  2. With the XMLSimple.sbapp application selected and active, click the Run button. This opens the SB Test/Debug perspective and starts the application.

  3. In the Feed Simulations view, select the TradeHist.sbfs feed simulation file and click Run.

  4. In the Application Output view, observe four tuples emitted. Select each tuple in sequence to see its contents in the Details Pane. Click the arrow on the left of the Trade subtuple to see the values of that tuple field.

    1. The first three tuples show a Trade tuple generated from fields extracted from the XML input field in the feed simulation.

    2. The first three tuples show null for the XMLErrorMessage field, and each has the input tuple's two non-XML fields appended verbatim.

    3. The last emitted tuple shows null for all fields of the Trade subtuple, and shows error text from the XML parser. The last emitted tuple is generated from an incomplete <trade> element in the CSV file read by the feed simulation.

  5. Experiment with edits to the TradeHist.csv input file. For example, you can eliminate the fourth error tuple by removing the incomplete <trade> element from the end of the XML string field. As an alternative, you can generate an XML error earlier in the sequence by creating a deliberate XML error in the first, second, or third <trade> element in the XML string field. Re-run the feed simulation each time to see the results of your experiments.

  6. When done, press F9 or click the Stop Running Application button.

Running the XML Normalizer Sample in Terminal Windows

This section describes how to run the sample in UNIX terminal windows or Windows command prompt windows. On Windows, be sure to use the StreamBase Command Prompt from the Start menu as described in the Test/Debug Guide, not the default command prompt.

  1. Open three terminal windows on UNIX, or three StreamBase Command Prompts on Windows. In each window, navigate to the directory where the sample is installed, or to your workspace copy of the sample, as described above.

  2. In window 1, start StreamBase Server with this command:

    sbd XMLSimple.sbapp

  3. In window 2, start the StreamBase dequeuer. Enter:

    sbc dequeue OutputStream

    No output is displayed at this point, but the dequeuer is prepared to receive output. This window will eventually show the output of the all the query operations.

  4. In window 3, enqueue data to your application with the following command:

    sbfeedsim TradeHist.sbfs

  5. In window 2, observe three emitted tuples and one error tuple like the following example:

    "MSFT,25.48,USD,2000,NASDAQ",null,4456,After Hours
    "IBM,164.25,USD,5000,NYSE",null,4456,After Hours
    "DELL,14.26,USD,20000,NASDAQ",null,4456,After Hours
    "null,null,null,null,null","Sax ParsingError: The element type ""trade"" must 
      be terminated by the matching end-tag ""</trade>"".",4456,After Hours
  6. In window 3, type: Ctrl+C to exit the sbc session.

  7. In window 3, type the following command to terminate the server and dequeuer:

    sbadmin shutdown

Importing This Sample into StreamBase Studio

In StreamBase Studio, import this sample with the following steps:

  • From the top menu, click FileLoad StreamBase Sample.

  • Select xml-normalizer from the Data Constructs and Operators category.

  • Click OK.

StreamBase Studio creates a single project for the operator samples.

Sample Location

When you load the sample into StreamBase Studio, Studio copies the sample project's files to your Studio workspace, which is normally part of your home directory, with full access rights.

Important

Load this sample in StreamBase Studio, and thereafter use the Studio workspace copy of the sample to run and test it, even when running from the command prompt.

Using the workspace copy of the sample avoids the permission problems that can occur when trying to work with the initially installed location of the sample. The default workspace location for this sample is:

studio-workspace/sample_xml-normalizer

See Default Installation Directories for the location of studio-workspace on your system.

In the default TIBCO StreamBase installation, this sample's files are initially installed in:

streambase-install-dir/sample/xml-normalizer

See Default Installation Directories for the default location of studio-workspace on your system.