Contents
The XML to Tuple Java operator converts XML-encoded messages to StreamBase tuples. The operator's input port schema has a single string field that passes an XML-encoded message to the operator. The operator parses the XML message and populates tuple fields corresponding to the elements and attributes found in the message. Each XML message enqueued to the operator results in a single tuple emitted on its output port.
The operator's output schema determines the set of fields retrieved from the XML messages. The hierarchy of the fields in the schema must match that of the elements in the XML message. Fields not present in the XML message are set to null in the emitted tuple.
Repeated XML elements at a given level can be retrieved with a StreamBase field of
type list. For example, an XML message containing <MyInt>1</MyInt><MyInt>2</MyInt><MyInt>3</MyInt>
could be retrieved with a tuple field named MyInt of type list<int>
. If, however, the MyInt tuple field is of type
int
, it would be populated with the value of the first
MyInt
XML element instance, the subsequent instances
would be discarded, and a warning would be emitted for each discarded instance. Tuple
list fields can be used to retrieve not only repeated leaf XML elements (as in the
example above), but also repeated non-leaf elements using fields of type list<tuple>
. One of the StreamBase applications shipped with
this operator's sample, xml2tuple-datatypes.sbapp
,
illustrates both scenarios.
Support for XML attributes is controlled through an operator property. When
attributes are disabled, the tag of an XML leaf element typically matches the name of
the tuple field that receives its value. For example, a tuple field named
MyInt
of type int
receives
the value 123
when an the XML element <MyInt>123</MyInt>
is processed.
When attributes are enabled, an XML element's value and attributes are retrieved
through subtuples of the tuple whose name matches the XML element's tag. For example,
to retrieve the value and attributes of an XML element <MyInt myattr="myattrvalue">123</MyInt>
, a tuple field
named MyInt
of type tuple
containing two subfields named _VALUE
and _ATTRIBUTES
should be present in the output schema. The _VALUE
subfield would be of type int
and receive 123
, while the _ATTRIBUTES
subfield would be of type list<tuple<string Name, string Value>>
and receive a
list with a single tuple whose Name
and Value
fields would contain myattr
and
myattrvalue
, respectively.
An alternate mechanism is available for retrieving attribute values. Rather than
using an _ATTRIBUTES
subfield, a subfield with a name
matching the attribute name and type compatible with the attribute value can be used.
Thus, to retrieve the myattr
attribute from the XML
element above, a subfield named myattr
of type
string
could be used.
Note
When attributes are enabled, a _VALUE
subfield needs
to be used to retrieve an XML element's value if no attributes are to be retrieved
from that specific element.
This section describes the properties you can set for an XML to Tuple operator, using the various tabs of the Properties view in StreamBase Studio.
Name: Use this field to specify or change the component's name, which must be unique in the application. The name must contain only alphabetic characters, numbers, and underscores, and no hyphens or other special characters. The first character must be alphabetic or an underscore.
Operator: A read-only field that shows the formal name of the operator.
Class: A field that shows the fully qualified class name that implements the functionality of this operator. Use this class name when loading the operator in StreamSQL programs with the APPLY JAVA statement. You can right-click this field and select Copy from the context menu to place the full class name in the system clipboard.
Start with application: If this field is set to Yes or to a module parameter that evaluates to true, an instance of this operator starts as part of the containing StreamBase Server. If this field is set to No or to a module parameter that evaluates to false, the adapter is loaded with the server, but does not start until you send an sbadmin resume command, or until you start the component with StreamBase Manager. With this option set to No or false, the operator does not start even if the application as a whole is suspended and later resumed. The recommended setting is selected by default.
Enable Error Output Port: Select this check box to add an Error Port to this component. In the EventFlow canvas, the Error Port shows as a red output port, always the last port for the component. See Using Error Ports and Error Streams to learn about Error Ports.
Description: Optionally enter text to briefly describe the component's purpose and function. In the EventFlow canvas, you can see the description by pressing Ctrl while the component's tooltip is displayed.
Property | Description |
---|---|
Element Value Field Name |
The name of the tuple subfield that receives an XML element's value; the
default is _VALUE . This field must be used
when attributes are being retrieved from an XML element. When attributes
are disabled, or no attributes are being retrieved from a particular XML
element, a tuple field with a name matching the XML element's tag can be
used.
|
Attribute Values Supported | If enabled (the default), attributes can be retrieved from XML elements, either through a field specified by the Attribute Values Field Name property or through a field with the same name as the XML attribute. |
Attribute Values Field Name |
The name of the tuple subfield that receives an XML element's attributes;
the default is _ATTRIBUTES . The schema of
attribute value fields must be list<tuple<string Name, string Value>> .
|
Date/Time Format |
The format to use in converting StreamBase date-time strings to timestamps
in parsing XML messages. The format of the format string is described in
the java.text.SimpleDateFormat
class described in the Sun Java Platform SE
reference documentation. Typical format string values include
yyyyMMdd and yyyyMMdd
HH:mm:ss .
|
Assume Local Time Zone | If enabled, date-time strings containing no timezone specifier are assumed to represent local time. If disabled (the default), date-time strings are assumed to represent GMT. |
Include Null List Values | If enabled (the default), Include list values containing nulls in the generated tuple. |
Null List Value Representation |
Representation of null list values in XML. The default is null .
|
Use Namespaces |
If enabled (default is disabled), the system tries to match namespaces, as
well as XML elements, to schema field names. If disabled, the namespaces in
the XML are ignored. For example, if enabled and an XML element is
<n0:root> , the schema field name should
be #"n0:root" .
|
Namespace Field Separator |
The string value to join the namespace to the field when evaluating against
a schema field. For example if the separator is _ and the namespace is <n0:root> , the schema field name should be
n0_root .
|
Enable Pass Through Fields | When enabled, all fields from the incoming tuple are replicated in the output. When selecting this option, you must specify the XML Field parameter. Default is disabled. |
XML Field | Identifies the field of the incoming tuple that contains the XML data. This parameter is only used when Enable Pass Through Fields is enabled. |
Field Name Replacements |
Specifies key-value pairs for mapping XML tag elements to field names. The
mappings are applied before trying to match XML elements to schema field
names. For example, if a key is '-' and its
associated value is '_' then an <Activity-Input> element is converted to
<Activity_Input> before attempting to
match to a schema field.
|
Enable Status Port | If enabled (the default), status tuples are sent to port 2. If disabled, No status is reported. If disabled after previously being enabled, the arc connected to port 2 is deleted. |
Log Level | Controls the level of verbosity the adapter uses to send notifications to the console. This setting can be higher than the containing application's log level. If set lower, the system log level is used. Available values, in increasing order of verbosity, are: OFF, ERROR, WARN, INFO, DEBUG, TRACE, and ALL. |
Property | Description |
---|---|
XPath Expressions | A mapped list of XPath expressions to use and their associated schema field names. |
-
The XPath can be any valid XPath v1 statement.
-
Each XPath must be mapped to a top level schema field name from the edit schema tab.
-
The schema field can be of any type, but note that if the XPath produces multiple node values and the data type is not a list then the last node will be used as the value.
-
If you would like to produce XML strings from the XPath statement, your field name must end with
__XML
and the data type must be a string or list of strings.
Use the Concurrency tab to specify parallel regions for this instance of this component, or multiplicity options, or both. The Concurrency tab settings are described in Concurrency Options, and dispatch styles are described in Dispatch Styles.
Caution
Concurrency settings are not suitable for every application, and using these settings requires a thorough analysis of your application. For details, see Execution Order and Concurrency, which includes important guidelines for using the concurrency options.
As shown in the diagram below (depicting one of the operator's sample applications), the operator has one input port and two output ports to communicate with the surrounding application.
The XML to Tuple operator's ports are used as follows:
-
XMLIn: The XML message to be converted to a tuple. The XMLIn port has the following schema:
-
XML, string: The contents of the XML message to be converted.
-
-
TupleOut: This output port contains one or more top-level fields, each of which is used to receive the results of XML message with a different top-level tag. For example, a TupleOut schema having top-level fields
MyInt
andMyString
could be used to parse XML messages<MyInt>123</MyInt>
and<MyString>This is a string</MyString>
, respectively. IfEnable Pass Through Fields
is checked this port will also contain all the fields from the input port. -
Status: A tuple is emitted on this port when an attempt to convert an XML message to a tuple fails. The Status port has the following schema:
-
type, string: Contains the following value describing the type of event that occurred:
-
Convert
-
-
action, string: Contains the following value indicating the conversion failed:
-
Failed
-
-
object, string: Contains a string representation of the input tuple.
-
message, string: Contains a human-readable description of the conversion failure.
-
time, timestamp: Contains the time of the conversion failure.
-
inputTuple, tuple: Contains a copy of the input tuple.
-
The XML to Tuple operator uses typecheck messages to help you configure the operator in your StreamBase application. In particular, the operator generates typecheck messages when:
-
The
XMLIn
port schema does not contain exactly one field of typestring
andEnable Pass Through Fields
is unchecked. -
The
TupleOut
port schema contains a field of typelist<list<?>>
(which is not allowed). -
The
TupleOut
port schema contains an Element Value Field (default name_VALUE
) of typetuple
orlist
(which is also not allowed). -
The
TupleOut
port schema contains an Attribute Values Field (default name_ATTRIBUTES
) that is not of typelist<tuple<string Name, string Value>>
. -
The Attribute Values Supported property is enabled and No Attribute Values Field Name is specified.
-
The Element Value Field Name and Attribute Values Field Name properties contain the same non-empty value.
-
The
TupleOut
port schema contains at least one timestamp field and no Date/Time Format string is specified. -
The
Enable Pass Through Fields
property is checked and noXML Field
is specified. -
A value is specified in the
Field Name Replacements
property but no key. -
An invalid Date/Time Format string is specified.
The operator generates messages on the status port when an attempt to convert an XML message to a tuple fails.