Using the TIBCO StreamBase Field Serializer Operator

Introduction

The Field Serializer operator is a Java operator that provides a way to serialize the unused fields of a large tuple into a single blob field, leaving untouched the fields that your application must process. At the end of your processing chain, after processing the fields of interest, you can deserialize the blob field to reconstruct the tuple's unused fields. This effectively compresses a large tuple as it passes through your application, for a potential throughput and performance improvement.

The Field Serializer operator is a member of the Java Operator group in the Palette view in StreamBase Studio. Select the Field Serialize operator from the Insert an Operator or Adapter dialog, which you invoke with one of the following methods:

Drag the Adapters, Java Operators token from the Operators and Adapters drawer of the Palette view to the canvas.
Click in the canvas where you want to place the operator, and invoke the keyboard shortcut O V
From the top-level menu, invoke Insert → Operator → Java.

From the Insert an Operator or Adapter dialog that opens, select Field Serialize and double-click or press OK.

The Field Serializer operator is expected to be used in matched pairs: one in serialize mode, and another farther downstream in deserialize mode.

For example, you might have a multi-module application that must process a complex data feed whose incoming tuple has 100 fields. But your StreamBase application only needs to analyze and modify 10 of those fields. In many applications, you can simply discard the unused 90 fields with a Map or Filter operator near the beginning of your processing chain. But there are scenarios where you must preserve the entire incoming tuple throughout, perhaps for consumption by another application downstream that expects the full tuple. In these cases, the StreamBase application would be carrying 90 untouched fields through every step of processing.

In those cases, you can use a Field Serializer operator in serialize mode to specify:

The field names of the 90 fields your application does not need.
The name of a field to be appended to the tuple by the operator.

The Field Serializer operator serializes the specified 90 fields into the appended field. Thus, in this example, the incoming tuple has 100 fields, but the outgoing tuple has 11 fields: the incoming 10 that you did not mark for serialization, plus the appended field that contains the serialized 90 fields.

Farther downstream in your application, specify another Field Serializer operator, this time in deserialize mode. In this case, the incoming tuple must include, at minimum, the blob field that contains the serialized 90 fields. The incoming tuple can also contain the 10 fields processed by your application. The resulting output tuple is the original 100-field tuple, with 10 fields processed and 90 fields unchanged.

Ports

By default, the Field Serializer operator has one input port and one output port.

In its default configuration, the operator is almost a pass-through operator that copies the tuple on its input port to its output port, appending one field of type blob, named in the Serialized Field control on the Operator Properties page. By default, the appended field does not contain any serialized fields. That is, the operator does not serialize any fields until you list the fields to be serialized in the Edit Schemas tab.

You can also add an optional Error Output port, which outputs a StreamBase error tuple for any error thrown by the operator, as described in General Tab.

Properties: General Tab

This section describes the properties on the General tab in the Properties view for the Field Serializer operator.

Name: Use this field to specify or change the component's name, which must be unique in the application. The name must contain only alphabetic characters, numbers, and underscores, and no hyphens or other special characters. The first character must be alphabetic or an underscore.

Operator: A read-only field that shows the formal name of the operator.

Class: A field that shows the fully qualified class name that implements the functionality of this operator. Use this class name when loading the operator in StreamSQL programs with the APPLY JAVA statement. You can right-click this field and select Copy from the context menu to place the full class name in the system clipboard.

Start with application: If this field is set to Yes or to a module parameter that evaluates to true, an instance of this operator starts as part of the containing StreamBase Server. If this field is set to No or to a module parameter that evaluates to false, the adapter is loaded with the server, but does not start until you send an sbadmin resume command, or until you start the component with StreamBase Manager. With this option set to No or false, the operator does not start even if the application as a whole is suspended and later resumed. The recommended setting is selected by default.

Enable Error Output Port: Select this check box to add an Error Port to this component. In the EventFlow canvas, the Error Port shows as a red output port, always the last port for the component. See Using Error Ports and Error Streams to learn about Error Ports.

Description: Optionally enter text to briefly describe the component's purpose and function. In the EventFlow canvas, you can see the description by pressing Ctrl while the component's tooltip is displayed.

Properties: Operator Properties Tab

This section describes the properties on the Operator Properties tab in the Properties view for the Field Serializer operator.

Property Data Type Default Description

Output type

Radio buttons

Serialize

Property	Data Type	Default	Description
Output type	Radio buttons	Serialize	Choose Serialize or Deserialize to specify the operation of this operator instance.
Serialized Field	string	`serializedFields`	For operators whose Output type is Serialize, specifies the name of the field this operator is to append to the outgoing tuple. The appended field is always of data type blob, and contains the serialization of the fields as listed in the Edit Schemas tab. For operators whose Output type is Deserialize, specifies the name of the blob field in the incoming tuple that contains a set of fields serialized by an upstream Field Serializer operator. You must specify the expected field contents of the serialized field in the Edit Schemas tab.

Choose Serialize or Deserialize to specify the operation of this operator instance.

Serialized Field

string

serializedFields

For operators whose Output type is Serialize, specifies the name of the field this operator is to append to the outgoing tuple. The appended field is always of data type blob, and contains the serialization of the fields as listed in the Edit Schemas tab.

For operators whose Output type is Deserialize, specifies the name of the blob field in the incoming tuple that contains a set of fields serialized by an upstream Field Serializer operator. You must specify the expected field contents of the serialized field in the Edit Schemas tab.

Properties: Edit Schema Tab

For the Field Serializer operator, use the Edit Schema tab in two cases:

For operators whose Output type is Serialize, to specify the fields of the incoming tuple to be serialized into the appended field.
For operators whose Output type is Deserialize, to specify the schema of the serialized fields in the incoming field designated as the Serialized Field.

The schema of the fields you serialize must exactly match the schema of the fields you deserialize. StreamBase Systems strongly recommends using a named schema to make sure the serialize and deserialize schemas are identical.

Note

Typechecking of your application module cannot validate that the schemas of your serialize and deserialize operators are identical. If you have a schema mismatch in the two operators of a matched pair of Field Serializer operators, the error can only be reported at runtime in the Error Log view in Studio, or on the console for command-line launches of StreamBase Server.

Use the Edit Schemas tab much like other Edit Schemas tabs throughout StreamBase Studio:

Use the control at the top of the Edit Schemas tab to specify the schema type:

Named schema

Use the drop-down list to select the name of a named schema previously defined in or imported into this module. The drop-down list is empty unless you have defined or imported at least one named schema for the current module.

When you select a named schema, its fields are loaded into the schema grid, overriding any schema fields already present. Once you import a named schema, the schema grid is dimmed and can no longer be edited. To restore the ability to edit the schema grid, re-select Private Schema from the drop-down list.

Private schema

Populate the schema fields using one of these methods:

Define the schema's fields manually, using the Add button to add a row for each schema field. You must enter values for the Field Name and Type cells; the Description cell is optional. For example:

Field Name	Type	Description
symbol	string	Stock symbol
quantity	int	Number of shares

Field names must follow the StreamBase identifier naming rules. The data type must be one of the supported StreamBase data types, including, for tuple fields, the identifier of a named schema and, for override fields, the data type name of a defined capture field.

Add and extend a parent schema. Use the Add button's Add Parent Schema option to select a parent schema, then optionally add local fields that extend the parent schema. If the parent schema includes a capture field used as an abstract placeholder, you can override that field with an identically named concrete field. Schemas must be defined in dependency order. If a schema is used before it is defined, an error results.
Copy an existing schema whose fields are appropriate for this component. To reuse an existing schema, click the Copy Schema button. (You may be prompted to save the current module before continuing.)

In the Copy Schema dialog, select the schema of interest as described in Copying Schemas. Click OK when ready, and the selected schema fields are loaded into the schema grid. Remember that this is a local copy and any changes you make here do not affect the original schema that you copied.

The existing schema can be from a system stream, or from any named or unnamed schema defined in the current module or in another application in your workspace. You can also select a CSV text file and populate a schema with its column headers. Studio will attempt to infer data types from the first few rows of values, and you can override the types it identifies. Currently, auto-detection of int, double, boolean, string, timestamp and tuples are supported, but not lists or functions. When indicating tuples, the CSV header must identify subtuples with dot notation, for example as stock.symbol, stock.price.

Use the Remove, Move Up, and Move Down buttons to edit and order your schema fields.

Optionally, document your schema in the Schema Description field.

Properties: Concurrency Tab

Use the Concurrency tab to specify parallel regions for this instance of this component, or multiplicity options, or both. The Concurrency tab settings are described in Concurrency Options, and dispatch styles are described in Dispatch Styles.

Caution

Concurrency settings are not suitable for every application, and using these settings requires a thorough analysis of your application. For details, see Execution Order and Concurrency, which includes important guidelines for using the concurrency options.