StreamBase recognizes the following data types:
See Specifying Literals in Expressions for information on entering literals for each data type in expressions.
Blobs provide a way of representing binary data in a tuple. They are designed to efficiently process large data objects such as video frames or other multimedia data, although performance might diminish with larger sizes.
In expressions, blobs can be used in any StreamBase function that supports all data types, such as string()
or firstval()
. To use a blob in other functions, the blob must first be converted to a supported data type. For example, error()
does not accept a blob, but you can call error(string(
, where you first recast the blob as a string.
b
))
The StreamBase blob()
function converts a string to a blob.
Two or more blob values are comparable with relational operators; the comparison is bytewise.
A Boolean always evaluates to either true
or false
.
Two or more bool values are comparable with relational operators as follows: true
> false
.
When you declare a capture field as part of the schema for a hygienic module, you assign a type name for that capture field. Thereafter, that capture type name appears in the drop-down list of data
types in Studio, and can be referenced as @
in expressions. While a capture field is not strictly a type of data in the same sense as the other types on this page, a
capture field's type name is effectively a data type in some circumstances. See Capture Fields in the Authoring Guide.
capture-typename
A double is always 8 bytes. Any numeric literal in scientific notation is considered a double. That is, you do not need a decimal point for a numeric literal to be recognized as a double (instead of an int). For example, 1e1
is considered to be 10.0
(double) instead of 10
(integer).
The StreamBase double type conforms to the IEEE 754 decimal64 specification. If the precision of a value needs to be corrected before printing, writing to a log, or sending to an external
system, use the round() function as the last step. Do not use round()
before or within an aggregation or an iterative, looping, or large summation calculation, as this would introduce cumulative
mathematical error.
Two or more double values are comparable numerically with relational operators.
function
is a StreamBase data type and a reserved keyword. Use a constructor for the function data type to create a custom expression
language function whose components are built-in functions, math operators, and even other functions.
The primary use for the function data type is to declare custom expression language functions in various contexts, and to
call them like built-in expression language functions. A function can also be the data type of a field on a stream, and thus
passed to other components. In addition, there are certain higher order functions such as foldleft
that take a defined function as an argument.
The function data type is described in more detail in Using the Function Data Type.
An int is always 4 bytes. The valid range of the int data type is -2,147,483,648 [–231]
to 2,147,483,647 [231–1]
, inclusive. Integers are always signed. Thus, the following expression is not valid, because the number 2,147,483,648
is not a valid int:
2147483648 + 0
However, integer computations wrap around cleared, so the following expression is valid and evaluates to -2,147,483,648
:
2147483647 + 1
The following expression is valid, because the period means that the number is treated as a floating-point value, which can have a much greater value:
2147483648. + 0.
Two or more int values are comparable numerically with relational operators.
A list is an ordered collection of values, called elements, each of which is of the same StreamBase data type, called the list's element type. The element type can be any StreamBase data type, such as an int, a tuple, or even a list (thus allowing for constructions such as a list of list of int).
In addition to the list()
function, you use the []
constructor to create a list. For example, ['this','constructs','a','list','of','type','string',]
. As the example illustrates, the last list element can optionally be followed by a comma (which is ignored) to make it easier
to add a new element to the list if you later need to.
Lists are returned by a variety of functions in the StreamBase expression language, such as list()
and range()
.
Individual elements in a list can be accessed using their zero-based integer position (their index) in the list. In any expression in an EventFlow module, use brackets to address individual elements of a list. Thus, for
a field named L with data type list, use L[0]
to address the first element in the list, L[1]
for the second element, and L[length(L)-1]
to address the last element.
In most list-related functions that take an index, you can also use a negative index to count backward from the end of the
list. Thus, for a list L, L[-1]
is equivalent to L[length(L)-1]
.
The number of elements in a list can range from zero to a theoretical maximum of 231–1 elements (although that maximum cannot be reached in typical practice). The number of elements in a list is determined at application runtime.
Two or more lists are comparable with relational operators. The comparison is performed lexicographically: that is, like words in a dictionary, with each element in list AAA compared to its corresponding element in list BBB, just as the letters in two words are compared, one by one. For example, the list [1, 2, 3] is less than the list [1, 9, 3], and the list [1, 2] is less than the list [1, 2, 3].
Lists with numeric element types can be coerced if two lists are concatenated or merged in a Union operator, following the rules listed in Data Type Coercion and Conversion. For example, if you have a list(int) merging with a list(double), the result is a merged list(double). Two list(tuple) will successfully merge if a valid supertype tuple can be found. Coercion rules do not apply to output streams with declared, explicit schemas.
See Null Lists for a discussion of null lists compared to empty lists.
Most lists you deal with in the StreamBase expression language are tuple fields with a declared list type in the tuple's schema. You address such a list by means of the field's name, optionally using the bracket syntax shown above to address individual elements of the list. Functions are provided to append to a list, insert elements, replace elements, return the first, last, minimum, or maximum element in a list, and much more. See the list management elements in Simple Functions: Lists.
You can create lists of your own using one of the functions that returns a list, such as list()
and range()
. As an alternative, you can define a list and specify its contents by placing a comma-separated list of elements in square
brackets. Thus, list(100.00, 130.00, 78.34)
and [100.00, 130.00, 78.34]
express the same list.
In contexts where list data appears in string form, such as the output of of sbc dequeue, lists are output in standard array format within square brackets. For example:
list(int) [1,3,5] list(double) [34.78,123.23,90.84,85.00]
Lists of strings do not show each element enclosed in quotes, because the element type of such a list is known to be string. For example:
list(string) [IBM,DELL,HPQ]
When specifying lists as input to sbc enqueue, enclose the list in quotes to escape the commas inside the list brackets. For list(string), there is no need to quote each
element. For example, to input data for a stream with schema {int, list(double), list(string), list(int)}
, use the following format:
9456,"[234.0,2314.44]","[IBM,DELL]","[3000,2000]"
When specifying strings and lists that occur within a tuple data type, use one pair of quotes around the tuple value, and
use two pairs of quotes to surround the string and list members of that tuple. For example, to input data for a stream with
schema tuple(int, int, int), tuple(string, list(string))
, use the following format:
"1, 3, 3"," ""Alpha"", ""[Beta,Gamma,Delta]"" "
In the example above, quotes surround the first tuple field, consisting of three int values. Quotes surround the second tuple field, from the comma after the 3 to the end. Within the second field, two pairs of quotes surround the string sub-field, and surround the list(string) sub-field. Notice that there is still no need to quote each element of the list(string) sub-field.
When using sbc dequeue with its –v
option, elements in a list of strings are shown surrounded with \"
, backslash-quote. This is a display-only convention meant to convey that the quotes around list(string) elements are escaped.
Do not use backslash-quote to enter data with sbc enqueue. For example, the previous example displayed by sbc dequeue -v looks like the following example:
OutputStream1: (tupleid=3,T=(i1=1,i2=3,i3=3),W=(x1="Alpha", x2="[\"Beta\",\"Gamma\",\"Delta\"]"))
(This example is shown here on two lines for clarity, but appears as a single unbroken line in practice.)
A long is always 8 bytes. The range is -9,223,372,036,854,775,808 [-263]
to +9,223,372,036,854,775,807 [263 -1]
. You can use the long data type to contain integer numbers that are too large to fit in the four-byte int data type.
When specifying a long value in a StreamBase expression, append L
to the number. Thus, 100L
and 314159L
are both long values. Without the L
, StreamBase interprets values in the int data type's range as ints. Values outside the range of an int are interpreted as
longs without the L.
Two or more long values are comparable numerically with relational operators.
When you define a named schema for a module or interface, StreamBase automatically generates a new function in the StreamBase expression language that allows
you to construct tuples with that schema. Thereafter, the names of named schemas appear in the drop-down list of data types
in Studio, which allows you to use a named schema's name wherever you would use the tuple
data type. Thus, while not strictly a type of data in the same sense as the other entries on this page, the names of named
schemas can be used as an effective data type. See Named Schema Constructor Function in the Authoring Guide.
A string is a field of text characters.
The theoretical maximum length for a string is maxint() characters, but the practical limit is much smaller. While StreamBase does support large tuples, including large string fields, be aware that moving huge amounts of data through any application negatively impacts its throughput.
Two or more string values are comparable with the relational operators. By default, strings are compared lexicographically based on ASCII sort order. If Unicode support is enabled for StreamBase Server (as described in Unicode Support), string elements are compared in the sort order for the current character set.
The timestamp data type can hold either an absolute timestamp or an interval timestamp.
An absolute timestamp represents a date and time. Its value is the number of seconds between the epoch and that date and time, with a maximum precision of milliseconds. The epoch is defined as midnight of January 1, 1970 UTC.
An interval timestamp represents a duration. Its value is the number of seconds in the interval, with a maximum precision of milliseconds.
The range for timestamp values is –262 to (262 – 1), which holds absolute timestamps for plus or minus 146 million years, and holds interval timestamp values between -4,611,686,018,427,387,904
and +4,611,686,018,427,387,903
.
Absolute timestamps are expressed in the time format patterns of the java.text.SimpleDateFormat
class described in the Oracle Java Platform SE reference documentation. For example, the now()
function returns a timestamp value for the current time. The returned value is a representation of the internal value as
a date and time. Thus, the now()
function returns the following when run on 14 Feb 2017 in the EST time zone:
2019-05-22 20:15:54.880-0400
By contrast, the expression hours(1)
returns an interval timestamp, showing the number of seconds in one hour:
3600.000
You can add and subtract timestamp values in expressions, using the rules in the following table:
Operation | Result | Example |
---|---|---|
interval + interval | interval | days(1) + hours(2) Result: 93600.0, the number of seconds in 26 hours. |
interval – interval | interval | days(1) – hours(2) Result: 79200.0, the number of seconds in 22 hours. |
absolute + interval | absolute | now() + hours(1) Result: an absolute timestamp representing the time one hour from now. |
absolute – absolute | interval | today_utc() - today() Result: an interval timestamp representing the number of seconds between midnight UTC and midnight in the local time zone. |
absolute + absolute | absolute | Adding two absolute timestamp values does not produce an error, but the results are undefined. |
Two or more timestamp values are comparable with relational operators such as > and <. You must compare timestamp values interval-to-interval or absolute-to-absolute. You cannot compare interval-to-absolute or absolute-to-interval.
In comparison expressions that use the operators ==, !=, <=, >=, <, or >, if one side of the comparison is a timestamp, and the other side is a string literal, StreamBase tries to interpret the string as a valid timestamp. If the string literal does not contain an explicit time zone, the string is interpreted as having the time zone set in the operating system of the computer that compiles the application. If the conversion of the string literal fails, then the comparison fails typechecking.
StreamBase supports three ways of specifying time zones. The following examples all indicate the same time zone (Central Europe):
Content | Data Type | Example |
---|---|---|
Offset from UTC in hours:minutes | double |
|
Zone Abbreviation | string |
|
Time Zone ID | string |
|
Certain timestamp functions allow you to use these specifications as arguments, but not interchangeably. Selected time zone IDs for the United States, Europe, and Asia are shown below. Note that time zone ID literals are case-sensitive.
Time Zone or City | Time Zone ID | UTC Offset | DST Offset |
---|---|---|---|
Buenos Aires, Argentina |
|
-03:00 | -03:00 |
US Eastern Standard Time |
|
-05:00 | -04:00 |
US Central Standard Time |
|
-06:00 | -05:00 |
US Mountain Standard Time (for areas that do not observe DST) |
|
-07:00 | -07:00 |
US Mountain Standard Time (for areas that observe DST) |
|
-07:00 | -06:00 |
US Pacific Standard Time |
|
-08:00 | -07:00 |
US Alaska Standard Time |
|
-09:00 | -08:00 |
US Hawaii Standard Time |
|
-10:00 | -10:00 |
Greenwich Mean Time |
|
00:00 | 00:00 |
London, UK |
|
+00:00 | +01:00 |
Zurich, Switzerland |
|
+01:00 | +02:00 |
Tallinn, Estonia | Europe/Tallinn |
+02:00 | +03:00 |
Moscow, Russia |
|
+03:00 | +03:00 |
Pune, India |
|
+05:30 | +05:30 |
Bangkok, Thailand |
|
+07:00 | +07:00 |
Singapore |
|
+08:00 | +08:00 |
Seoul, Korea |
|
+09:00 | +09:00 |
Auckland, New Zealand |
|
+12:00 | +13:00 |
For for a full listing, see the Wikipedia article List of tz database time zones. You can also obtain the list by calling the Java function TimeZone.getAvailableIDs()
. See the Javadoc for class TimeZone for details.
The get_*
functions (get_second()
, get_year()
, and so on) take time zone IDs as an optional second argument to obtain timestamp fields for a given time zone rather than
for local time, as described in Timestamp Fields.
The tuple data type is an ordered collection of fields, each of which has a name and a data type. The fields in the collection must be defined by a schema, which can be unnamed or named. Fields can be of any StreamBase data type, including other tuples, nested to any depth. The size of a tuple depends on the aggregate size of its fields.
See Null Tuples for a discussion of null tuples and empty tuples.
Two or more tuples are comparable with relational operators as long as the tuples being compared have identical schemas.
The following sections discuss features of the tuple data type:
Addressing Tuple Sub-Fields |
Using the Tuple Data Type in Expressions |
Viewing and Specifying Tuple Data in CSV Format |
Copying Tuple Contents |
Null Tuples, Empty Tuples, No-Fields Tuples |
In expressions, you can address a tuple field's individual sub-fields using dot notation:
.
tuplename
.tuplefieldname
In an EventFlow module, tuplename
is the name of a field of type tuple, and tuplefieldname
is name of a sub-field.
In an expression, use the tuple() function to create both schema and field values of a single tuple.
The name of a named schema automatically becomes a generated function that returns a single tuple with that schema. See named schema constructor function for details.
In contexts where a tuple value appears in textual string form, comma-separated value (CSV) format is used. Examples of such
contexts include the contents of files read by the CSV Input Adapter, written by the CSV Output Adapter, and in the result
of the Tuple.toString()
Java API method.
Use the nested quote techniques in this section to enter a field of type tuple when specifying input data at the command prompt with sbc enqueue.
The string form of a tuple with three integer fields whose values are 1, 2, and 3 is the following:
1,2,3
We will refer to the above as tuple A.
When tuple A appears as a field of type tuple inside another tuple, surround tuple A with quotes. For example, a second tuple, B, whose first field is a string and whose second field is tuple A, has a CSV format like the following:
IBM,"1,2,3"
These quotes protect the comma-separated values inside the second field from being interpreted as individual field values.
With deeper nesting, the quoting gets more complex. For example, suppose tuple B, the two-field tuple above, is itself the second field inside a third tuple, C, whose first field is a double. The CSV format of tuple C is:
3.14159," IBM,""1,2,3"" "
The above form shows doubled pairs of quotes around 1,2,3
, which is necessary to ensure that the nested quotes are interpreted correctly. There is another set of quotes around the
entire second field, which contains tuple B.
StreamBase's quoting rules follow standard CSV practices, as defined in RFC 4180, Common Format and MIME Type for Comma-Separated Values (CSV) Files.
You can duplicate any tuple field into another field of type tuple without using wildcards. For example, a Map operator might
have an entry like the following in its Additional Expressions grid, where both IncomingTuple
and CopyOfIncomingTuple
are the names of tuple fields:
Action | Field Name | Expression |
---|---|---|
Add | CopyOfIncomingTuple | IncomingTuple |
Use the .*
syntax to flatten a tuple field into the top level of a stream.
For example, a Map operator might define an entry like the following in its Additional Expressions grid. When using this syntax, you must have an asterisk in both Field Name and Expression columns.
Action | Field Name | Expression |
---|---|---|
Add | * | IncomingTuple.* |
Use the * AS *
syntax for tuples defined with a named schema to copy the entire tuple into a single field of type tuple.
For example, let's say the tuple arriving at the input port of a Map operator was defined upstream with the NYSE_FeedSchema
named schema. To preserve the input tuple unmodified for separate processing, the Map operator could add a field of type
tuple using settings like the following in the Additional Expressions grid. When using the * AS *
syntax in the Expression column, the name of the tuple field in the Field Name column has an implied asterisk for all of its fields.
Action | Field Name | Expression |
---|---|---|
Add | OriginalOrder | NYSE_FeedSchema(input1.* as *) |
Because the Map operator has only one input port, the port does not need to be named:
Action | Field Name | Expression |
---|---|---|
Add | OriginalOrder | NYSE_FeedSchema(* as *) |
A null tuple results when the entire tuple is set to null (not just the fields of the tuple).
An empty tuple is a tuple with each individual field set to null.
A no-fields tuple is what is sent to an input stream that has an empty schema, which is a schema with no fields defined, as described in Using Empty Schemas. An input stream with an empty schema might be declared, for example, as a trigger for a Read All Rows operation on a Query Table. In this case, the tuple sent to this input stream is itself neither null nor empty, it is a no-fields tuple.