Using the XML to Tuple Operator

Introduction

The XML to Tuple Java operator converts XML-encoded messages to StreamBase tuples. The operator's input port schema has a single string field that passes an XML-encoded message to the operator. The operator parses the XML message and populates tuple fields corresponding to the elements and attributes found in the message. Each XML message enqueued to the operator results in a single tuple emitted on its output port.

The operator's output schema determines the set of fields retrieved from the XML messages. The hierarchy of the fields in the schema must match that of the elements in the XML message. Fields not present in the XML message are set to null in the emitted tuple.

Repeated XML elements at a given level can be retrieved with a StreamBase field of type list. For example, an XML message containing <MyInt>1</MyInt><MyInt>2</MyInt><MyInt>3</MyInt> could be retrieved with a tuple field named MyInt of type list<int>. If, however, the MyInt tuple field is of type int, it would be populated with the value of the first MyInt XML element instance, the subsequent instances would be discarded, and a warning would be emitted for each discarded instance. Tuple list fields can be used to retrieve not only repeated leaf XML elements (as in the example above), but also repeated non-leaf elements using fields of type list<tuple>. One of the StreamBase applications shipped with this operator's sample, xml2tuple-datatypes.sbapp, illustrates both scenarios.

Support for XML attributes is controlled through an operator property. When attributes are disabled, the tag of an XML leaf element typically matches the name of the tuple field that receives its value. For example, a tuple field named MyInt of type int receives the value 123 when an the XML element <MyInt>123</MyInt> is processed.

When attributes are enabled, an XML element's value and attributes are retrieved through subtuples of the tuple whose name matches the XML element's tag. For example, to retrieve the value and attributes of an XML element <MyInt myattr="myattrvalue">123</MyInt>, a tuple field named MyInt of type tuple containing two subfields named _VALUE and _ATTRIBUTES should be present in the output schema. The _VALUE subfield would be of type int and receive 123, while the _ATTRIBUTES subfield would be of type list<tuple<string Name, string Value>> and receive a list with a single tuple whose Name and Value fields would contain myattr and myattrvalue, respectively.

An alternate mechanism is available for retrieving attribute values. Rather than using an _ATTRIBUTES subfield, a subfield with a name matching the attribute name and type compatible with the attribute value can be used. Thus, to retrieve the myattr attribute from the XML element above, a subfield named myattr of type string could be used.

Note

When attributes are enabled, a _VALUE subfield needs to be used to retrieve an XML element's value if no attributes are to be retrieved from that specific element.

Properties View Settings

This section describes the properties you can set for an XML to Tuple operator, using the various tabs of the Properties view in StreamBase Studio.

General Tab

Name: Use this field to specify or change the component's name, which must be unique in the application. The name must contain only alphabetic characters, numbers, and underscores, and no hyphens or other special characters. The first character must be alphabetic or an underscore.

Operator: A read-only field that shows the formal name of the operator.

Class: A field that shows the fully qualified class name that implements the functionality of this operator. Use this class name when loading the operator in StreamSQL programs with the APPLY JAVA statement. You can right-click this field and select Copy from the context menu to place the full class name in the system clipboard.

Start with application: If this field is set to Yes or to a module parameter that evaluates to true, an instance of this operator starts as part of the containing StreamBase Server. If this field is set to No or to a module parameter that evaluates to false, the adapter is loaded with the server, but does not start until you send an sbadmin resume command, or until you start the component with StreamBase Manager. With this option set to No or false, the operator does not start even if the application as a whole is suspended and later resumed. The recommended setting is selected by default.

Enable Error Output Port: Select this check box to add an Error Port to this component. In the EventFlow canvas, the Error Port shows as a red output port, always the last port for the component. See Using Error Ports and Error Streams to learn about Error Ports.

Description: Optionally enter text to briefly describe the component's purpose and function. In the EventFlow canvas, you can see the description by pressing Ctrl while the component's tooltip is displayed.

Operator Properties Tab

Property Description
Element Value Field Name The name of the tuple subfield that receives an XML element's value; the default is _VALUE. This field must be used when attributes are being retrieved from an XML element. When attributes are disabled, or no attributes are being retrieved from a particular XML element, a tuple field with a name matching the XML element's tag can be used.
Attribute Values Supported If enabled (the default), attributes can be retrieved from XML elements, either through a field specified by the Attribute Values Field Name property or through a field with the same name as the XML attribute.
Attribute Values Field Name The name of the tuple subfield that receives an XML element's attributes; the default is _ATTRIBUTES. The schema of attribute value fields must be list<tuple<string Name, string Value>>.
Date/Time Format The format to use in converting StreamBase date-time strings to timestamps in parsing XML messages. The format of the format string is described in the java.text.SimpleDateFormat class described in the Sun Java Platform SE reference documentation. Typical format string values include yyyyMMdd and yyyyMMdd HH:mm:ss.
Assume Local Time Zone If enabled, date-time strings containing no timezone specifier are assumed to represent local time. If disabled (the default), date-time strings are assumed to represent GMT.
Include Null List Values If enabled (the default), Include list values containing nulls in the generated tuple.
Null List Value Representation Representation of null list values in XML. The default is null.
Use Namespaces If enabled (default is disabled), the system tries to match namespaces, as well as XML elements, to schema field names. If disabled, the namespaces in the XML are ignored. For example, if enabled and an XML element is <n0:root>, the schema field name should be #"n0:root". This value is not used with XPath, please see the XPath Namespaces property
Namespace Field Separator The string value to join the namespace to the field when evaluating against a schema field. For example if the separator is _ and the namespace is <n0:root>, the schema field name should be n0_root. This value is not used with XPath, please see the XPath Namespaces property
Enable Pass Through Fields When enabled, all fields from the incoming tuple are replicated in the output. When selecting this option, you must specify the XML Field parameter. Default is disabled.
XML Field Identifies the field of the incoming tuple that contains the XML data. This parameter is only used when Enable Pass Through Fields is enabled.
Field Name Replacements Specifies key-value pairs for mapping XML tag elements to field names. The mappings are applied before trying to match XML elements to schema field names. For example, if a key is '-' and its associated value is '_' then an <Activity-Input> element is converted to <Activity_Input> before attempting to match to a schema field.
Enable Status Port If enabled (the default), status tuples are sent to port 2. If disabled, No status is reported. If disabled after previously being enabled, the arc connected to port 2 is deleted.
Log Level Controls the level of verbosity the adapter uses to send notifications to the console. This setting can be higher than the containing application's log level. If set lower, the system log level is used. Available values, in increasing order of verbosity, are: OFF, ERROR, WARN, INFO, DEBUG, TRACE, and ALL.

XPath Properties Tab

Property Description
XPath Expressions A mapped list of XPath expressions to use and their associated schema field names.
XPath Namespaces The XPath namespaces to resolve when parsing the XML. These values must be set in order to use namespaces with XPath operations.
  • The XPath can be any valid XPath v1 statement.

  • Each XPath must be mapped to a top level schema field name from the edit schema tab.

  • The schema field can be of any type, but note that if the XPath produces multiple node values and the data type is not a list, then the last node is used as the value.

  • To produce XML strings from the XPath statement, your field name must end with __XML and the data type must be a string or list of strings.

Concurrency Tab

Use the Concurrency tab to specify parallel regions for this instance of this component, or multiplicity options, or both. The Concurrency tab settings are described in Concurrency Options, and dispatch styles are described in Dispatch Styles.

Caution

Concurrency settings are not suitable for every application, and using these settings requires a thorough analysis of your application. For details, see Execution Order and Concurrency, which includes important guidelines for using the concurrency options.

Operator Ports

As shown in the diagram below, the operator has one input port and two output ports to communicate with the surrounding application.

The XML to Tuple operator's ports are used as follows:

  • XMLIn: The XML message to be converted to a tuple. The XMLIn port has the following schema:

    • XML, string: The contents of the XML message to be converted.

  • TupleOut: This output port contains one or more top-level fields, each of which is used to receive the results of XML message with a different top-level tag. For example, a TupleOut schema having top-level fields MyInt and MyString could be used to parse XML messages <MyInt>123</MyInt> and <MyString>This is a string</MyString>, respectively. If Enable Pass Through Fields is checked this port will also contain all the fields from the input port.

  • Status: A tuple is emitted on this port when an attempt to convert an XML message to a tuple fails. The Status port has the following schema:

    • type, string: Contains the following value describing the type of event that occurred:

      • Convert

    • action, string: Contains the following value indicating the conversion failed:

      • Failed

    • object, string: Contains a string representation of the input tuple.

    • message, string: Contains a human-readable description of the conversion failure.

    • time, timestamp: Contains the time of the conversion failure.

    • inputTuple, tuple: Contains a copy of the input tuple.

Typechecking and Error Handling

The XML to Tuple operator uses typecheck messages to help you configure the operator in your StreamBase application. In particular, the operator generates typecheck messages when:

  • The XMLIn port schema does not contain exactly one field of type string and Enable Pass Through Fields is unchecked.

  • The TupleOut port schema contains a field of type list<list<?>> (which is not allowed).

  • The TupleOut port schema contains an Element Value Field (default name _VALUE) of type tuple or list (which is also not allowed).

  • The TupleOut port schema contains an Attribute Values Field (default name _ATTRIBUTES) that is not of type list<tuple<string Name, string Value>>.

  • The Attribute Values Supported property is enabled and No Attribute Values Field Name is specified.

  • The Element Value Field Name and Attribute Values Field Name properties contain the same non-empty value.

  • The TupleOut port schema contains at least one timestamp field and no Date/Time Format string is specified.

  • The Enable Pass Through Fields property is checked and no XML Field is specified.

  • A value is specified in the Field Name Replacements property but no key.

  • An invalid Date/Time Format string is specified.

The operator generates messages on the status port when an attempt to convert an XML message to a tuple fails.