This topic describes the syntax and use of each function in the StreamBase expression language. See StreamBase Expression Language Features for information on using the expression language in general. See StreamBase Data Types for usage information on StreamBase data types.
This page is organized into the following sections:
The following table is a complete, alphabetized index into the functions provided in the StreamBase expression language, with links to the description for each function.
The majority of functions in the StreamBase expression language are simple functions, which operate on a single tuple field at a time. You can use simple functions in expressions for any StreamBase operator (except the Heartbeat, Metronome, and Union operators, which do not accept expressions).
The expression language also supports aggregate functions, which are described in Aggregate Functions Overview.
Simple functions are organized into the following categories:
This category includes the following functions:
addbson() | bsontojson() | findbson() |
frombson() | joinbson() | jsontobson() |
lenbson() | listbson() | namebson() |
putbson() | removebson() | splitbson() |
stringbson() | tobson() | tuplebson() |
These functions create, manipulate and extract contents of BSON blobs. BSON blobs, as defined by bsonspec.org, are maintained in a binary format in which zero or more values, arrays, and objects with fields (key/value pairs) are stored as a single entity. A variety of implementations exist.
BSON is short for Binary JSON. BSON blobs serialize data in the manner of JSON, and like JSON, BSON data may include nested objects. JSON, as defined on http://www.json.org is text consisting of colon-separated name-value pairs (with field name quoted), and each pair separated by a comma. A BSON-encoded blob is sometimes referred to as a document.
The following table indicates how StreamBase types correspond to BSON types.
StreamBase Type | BSON Type |
int | int32 |
long | int64 |
double | double |
bool | Boolean |
string | UTF-8 string |
null | null |
timestamp | timestamp* |
blob | binary |
list | array |
tuple | object |
*The StreamBase BSON Java implementation for timestamps differs from the official BSON specification. They are both int64 but the StreamBase format is used internally rather than the bsonspec.org format.
Function syntax:
blob addbson(blob bsonblob-array, T element)
Adds a new field at the end of a BSON array. Accepts a BSON blob that contains an array and an element to be appended to it.
element
can be of any type compatible with the contents of the array bsonblob-array
. Returns a BSON object for the expanded array.
Function syntax:
string bsontojson(blob bsonblob)
Converts BSON blob to JSON-formatted string. Accepts one BSON blob and returns its contents as a JSON object of key-value pairs.
Function syntax:
blob findbson(blob bsonblob, int index) blob findbson(blob bsonblob, string fieldname)
Searches a BSON object for a specified array index or field name. Accepts a BSON blob assumed to contain either a BSON array
or a BSON object and an index into the array or a string that should match a field name. Returns a blob containing the found
field or throws an error if not found or index
is out of bounds. index
is a zero-based index. fieldname
is a string, interpreted as a field name.
Function syntax:
T frombson(blob bsonblob, T type)
Converts a BSON blob to the requested StreamBase type. Accepts one BSON blob and returns its contents as the requested StreamBase type. The second argument provides an example of the return type and is not evaluated. If the type does not match, throws an error.
Function syntax:
blob joinbson(list(blob) L)
Concatenates a list of BSON fields into a BSON object. Accepts a list of BSON blobs containing fields and concatenates them.
If the list argument is empty, returns an empty blob. If the list is null, returns null. To test for null blobs, use the isnull
function. You can use splitbson
to create a list of BSON fields.
Function syntax:
blob jsontobson(string jsonstring)
Converts a JSON string to a BSON blob. Accepts one JSON string and returns its contents as a BSON blob. The argument jsonstring
is any JSON string. For example:
jsontobson('{"symbol":"IBM","price":75.91,"quantity":150, "date":"2013-04-28 16:00:00.000-0400"}')
Function syntax:
int lenbson(blob bsonblob)
Returns the length of a BSON object or array.
If bsonblob
contains an array, returns the number of elements in it.
If bsonblob
is an object, returns the number of fields in it.
If bsonblob
is any other scalar type, throws an error.
Function syntax:
blob listbson(T1 value1, T2 value2, ... Tn valuen)
Accelerator for constructing a (possibly heterogeneous) BSON array, similar to a StreamBase list. Accepts any number of arguments
and returns a BSON array containing them. Values are individually interpreted by tobson
and can be of any type.
Function syntax:
string namebson(blob bsonblob)
Returns the name of a BSON field
If bsonblob
is a BSON field, returns the field name, otherwise returns an empty string.
Function syntax:
blob putbson(blob bsonblob-object, string name, T value)
Adds a new field to a BSON object. Returns the blob bsonblob-object
with added field name
having value value
. If field name
already exists, its current value is replaced with value
.
Function syntax:
blob removebson(blob bson-list, int element-index1, ... int element-indexN) blob removebson(blob bson-tuple, int element-index1|string fieldname1, ... int element-index2|string fieldnameN)
Removes fields or list elements from BSON blobs. The BSON blob can contain an array or an object. A blob is returned with fields or elements that match the specified element-index or fieldname arguments removed. Fields can be removed from objects by index or field name.
The order of indexes and names in arguments does not matter. Any out-of-bounds indexes and non-matching field names are ignored.
Function syntax:
list splitbson(blob bsonblob-containing-list) list splitbson(blob bsonblob-containing-object)
Returns a list of BSON fields from a BSON object or a list of BSON values from a BSON array. Use frombson
to convert individual returned values to StreamBase types.
Function syntax:
string stringbson(blob bsonblob-string)
Converts a BSON blob to a string. The BSON blob must contain one string. Unlike the string
function, no type conversion is performed.
Function syntax:
blob tobson(T arg)
Converts StreamBase values to a BSON blob. Each StreamBase int, long, double, string, timestamp, and blob maps to a respective BSON type, nulls map to nulls, lists map to BSON arrays, and tuples map to BSON objects.
Function syntax:
blob tuplebson(AS-expr1, AS-expr2, ... AS-exprN)
Accelerator for constructing a BSON object from comma-separated AS expressions (comma-separated value expressions, each field named with the AS keyword). The resulting blob contains an object having the specified fields and values.
The following example creates a BSON object with three fields:
tuplebson("IBM" AS symbol, 87.3875 AS price, 47 AS quantity)
This category includes the following functions:
Function syntax:
T catchexception(Targ1
[, ... [, Targn
]])
Attempts to evaluate all arguments in order, returning the first one that evaluates without an error, or null if all arguments evaluate to an error. Accepts all data types, but all arguments must have the same type. The returned value has the same data type, T, as the arguments.
For example, the following example attempts to parse str
as an interval, but returns a null timestamp if parsing fails:
catchexception(interval(str))
The following example attempts to divide a
by b
, but returns -1
if the division fails:
catchexception(a/b, -1)
Function syntax:
bool error(string message
)
Throws an error with message text message
that can be caught using the StreamBase error streams mechanism. In the default configuration, this produces an error message on the console, and the server continues processing.
Because error() returns bool, it works well as a predicate expression in operators. For example, use error("
as the last predicate expression for a Filter operator to produce an error message when no earlier predicate expression was
matched.
Error condition
")
Function syntax:
T throw(string message
)
Throws an error with message text message
that can be caught using the StreamBase error streams mechanism. In the default configuration, this produces an error message on the console, and the server continues processing.
Because throw() returns an untyped null by default, it works well in if
statements. For example: if (test-expr) then (5) else throw("Failed the test")
For the purposes of typechecking, cast the return of this function with one of the casting functions. For example, use bool(throw("Unmatched filter option"))
as the last predicate expression of a Filter operator to produce an error message when no earlier predicate expression was
matched.
This category includes the following functions:
Function syntax:
T callcpp(stringclass
[,arg1
, ... [,argn
]])
As an alternative to using callcpp()
, you can define an alias for your custom functions. See Using Function Aliases.
Use an alias or callcpp()
to run a custom C++ function directly from a StreamBase operator that uses an expression. The return type, T, of callcpp()
is the same as the return type of the called function. Custom C++ functions are functions you build with the StreamBase C++
Client API. You can use the simple form of callcpp()
in any expression except aggregate expressions. (To use callcpp()
in aggregate expressions, refer to the aggregate callcpp().)
For callcpp()
to locate the function being called, you must specify the location of the containing DLL or .so
file, and must register the function with a <custom-function>
element in the StreamBase Server configuration file. See C++ Function Overview for details.
To learn about coding custom C++ functions, refer to Creating Custom C++ Functions in the API Guide.
Function syntax:
T calljava(stringclass
, stringmethod
[,arg1
, ...,argn
])
As an alternative to using
, you can define an alias for your custom functions. See Using Function Aliases.
calljava
Use an alias or calljava()
to run a custom Java function directly from a StreamBase operator that uses an expression. The return type, T, of calljava()
is the same as the return type of the called function. Custom Java functions are functions you build with the StreamBase
Java Client library. You can use the simple form of calljava()
in any expression except aggregate expressions. (To use calljava()
in aggregate expressions, refer to the aggregate calljava().)
You can also use calljava()
(or an alias) to run any public function in any class on StreamBase Server's classpath, including standard library functions.
The calljava()
function distinguishes simple from aggregate functions by the number of arguments. For custom simple functions, you must
specify both class and method names. For custom aggregate functions, you specify only the class name.
To learn about coding custom Java functions, refer to Using the StreamBase Java Function Wizard in the API Guide. For information on the classpath requirements for custom Java functions, see Java Function Overview.
The function you write to be called with calljava()
can have any number of arguments, including none. Variable-length arguments are accepted by both simple and aggregate custom
functions. Each calljava()
argument arg1
through argn
associates a StreamBase data type with one of the following primitive or Java object types:
StreamBase Data Type | Java Primitive | Java Object |
---|---|---|
blob | — | com.streambase.sb.ByteArrayView |
bool | boolean | java.lang.Boolean |
double | double | java.lang.Double |
int | int | java.lang.Integer |
long | long | java.lang.Long |
list | — | java.util.List |
string | byte[] | java.lang.String |
timestamp | — | com.streambase.sb.Timestamp |
tuple | — | com.streambase.sb.Tuple |
In writing your Java function, use the Java primitive if the execution speed of your function is paramount. However, if you
pass a null for any argument that resolves in your function to a Java primitive, calljava()
returns an error and does not call your function. Only use Java primitives if your application otherwise insures that null
values cannot be passed to your function through calljava()
.
Use the Java object when your application might pass null to your function through calljava()
, and when no Java primitive is available.
This category includes the following functions:
Function syntax:
double black_scholes(string type, double underlying, double strike, double dividendYield, double riskFreeInterestRate, double Volatility, timestamp exerciseDate [, timestamp dealDate])
The black_scholes
function calculates fair-value and risk statistics (delta, gamma, vega) for European style options on securities with continuous
dividend yields. This is known as the Black-Scholes Generalized model. It also calculates implied volatility.
The black_scholes
function takes eight input arguments; the first seven arguments are required, while the eighth argument, dealDate
, is optional.
The arguments are:
-
type
: Required. Should be set to one of the following string values:-
Call (The price of an option.)
-
Put (The price of an option.)
-
DeltaCall (The sensitivity of the price of an option to changes in the price of the stock.)
-
DeltaPut (The sensitivity of the price of an option to changes in the price of the stock.)
-
ThetaCall (Theta measures how the price of an option changes with time.)
-
ThetaPut (Theta measures how the price of an option changes with time.)
-
Gamma (The sensitivity of a stock's delta to the stock price.)
-
Vega (The rate of change of the value of an option with respect to the volatility of the stock's price.)
-
RhoCall (The sensitivity of the price of an option with respect to the risk-free interest rate.)
-
RhoPut (The sensitivity of the price of an option with respect to the risk-free interest rate.)
-
ImpliedVolatilityCall (Implied volatility of the underlying stock for a given price.)
-
ImpliedVolatilityPut (Implied volatility of the underlying stock for a given price.)
-
-
underlying
: Required. A double. The price of the underlying stock. -
strike
: Required. A double. The strike price of the stock on the exercise date. -
dividendYield
: Required. A double. For example, 0.03 for 3.0%. -
riskFreeInterestRate
: Required. A double. For example, 0.05 for 5.0%. -
Volatility
(or value): Required. A double. This input argument is overloaded. When computing anything except the implied volatility for a given call or put price this argument is the volatility of the stock. For example, 0.2 for 20%. When computing implied volatility this argument is the option price. -
exerciseDate
: Required. An absolute or interval timestamp. An absolute timestamp represents the option exercise date, while an interval timestamp represents the number of days to exercise. -
dealDate
: Optional. A timestamp representing the option deal date. If not provided, the current date is the default.
Function syntax:
double compound_interest(double principalValue, double matureValue, double numberOfPeriods [, double guess])
Returns the compound interest given the principal value, mature value, and number of periods. An optional fourth argument,
guess
, can be used to set the initial compound interest used by the function. The compound_interest
function uses the Newton-Raphson algorithm to compute compound interest rate. This algorithm performs the same calculations
repetitively. Each iteration results in a compound interest that is closer to the final result. The number of iterations required
to get the results within an acceptable inaccuracy (1.0e-4
) depends on the number of periods and the initial compound interest used to start the calculation.
If the guess
argument is not provided, the default initial value is 0.01
(1%). If the function returns a NaN (not a number), you can provide the initial compound interest in this fourth argument.
If the initial value is closer to the actual rate of the computation, it can be sped up significantly.
This category includes the following functions:
Function syntax:
long inet_aton(string address
)
Takes a string containing an IPv4 address in dotted-quad notation ("nnn.nnn.nnn.nnn") and returns a long containing that address encoded in network byte order.
Function syntax:
string inet_ntoa(intaddress
) string inet_ntoa(longaddress
)
Takes an int or long containing an IPv4 address encoded in network byte order and returns a string containing that address in dotted-quad notation ("nnn.nnn.nnn.nnn").
A list is an ordered collection of values (called elements), each of which is of the same StreamBase data type, called the list's element type. The element type can be any StreamBase data type, such as an int, a tuple, or even a list (thus allowing for constructions such as a list of list of int).
Individual elements in a list can be accessed using their zero-based integer position (their index) in the list. In any expression in an EventFlow or StreamSQL program, use brackets to address individual elements of a list.
Thus, for a list field named L, use L[0]
to address the first element in the list, L[1]
for the second element, and L[length(L)-1]
to address the last element.
In most list-related functions that take an index, you can also use a negative index to count backward from the end of the
list. Thus, for a list L, L[-1]
is equivalent to L[length(L)-1]
.
The number of elements in a list is determined at application runtime. A null list is not the same as a list with zero elements (an empty list), which is not null.
Several functions that manipulate lists can take user-written functions as arguments. These are filterlist()
, foldleft()
, foldright()
, maplist()
, mergelist()
, and sort()
.
Function syntax:
list(T) append(list(T)L
, Te1
, Te2
, ..., Ten
)
Returns its argument list, L
, with elements e1
through en
added to the end of the list in the order specified. The data types of the appended elements e
and the element type of list L
must be the same, except that element types int and long can be coerced to long and double following the rules in Data Type Coercion and Conversion. The element type of the returned list is the same as the element type of the argument list and of the appended elements.
The function takes any number of element arguments, including zero. That is, append(L)
returns list L unchanged. If any list element is null, a null element is appended to the list in the position specified.
Function syntax:
double avg(list(double)L
) int avg(list(int)L
) long avg(list(long)L
) timestamp avg(list(timestamp)L
)
Given a list, L,
of doubles, ints, longs, or timestamps, returns the average of all members of the list. For a returned average of timestamp
values to make sense, the list should contain all interval timestamps or all absolute timestamps, but not both. The data type
of the returned value is the same as the element type of the argument list.
See also the aggregate version of avg().
Function syntax:
list(T) avg_list(list(T) LL(list(T)L1
, list(T)L2
, ...))
Accepts a list of lists as an argument, where each member list L1
, L2
, and so on of the containing list LL
must have the same element type or a coercible type. Returns a list containing the element-wise average of the elements of
each member list in LL. The list element type, T, of the returned list is the same as the element type of the argument lists.
For example:
avg_list( [ [1,2,3], [4,5,6] ] )
returns:
[2.5, 3.5, 4.5
The length of component lists can vary. Short lists are padded with nulls to match the length of the longest list seen in the current context. Null values are ignored for the computation of the average.
See also the avg_list() function for aggregates.
Function syntax:
list(T) concat(list(T)L1
, list(T)L2
[, ..., list(T)Ln
])
Returns a list produced by concatenating the elements of its argument lists in the order specified. Takes two or more list arguments, which must have the same element type. The element type, T, of the returned list is the same as the element types of the argument lists.
See also the aggregate version of concat().
Function syntax:
bool contains(list(T)L
, Titem
)
Takes two arguments: a list with element type T, and a value, item
, of the same type. Returns true
if item
is a member of the specified list, or false
if not.
If the entire list is null, returns null
. If a member of the list is null, and item
is null, returns true
. If no member of the list is null, and item
is null, returns false
.
If L
is a list of strings, item
must match an element exactly. Substrings of elements are not matched. Thus, the following expression returns false
:
'foo' in ['there','is','food','here']
The string version of contains() does match substrings, so to determine if a substring exists in a list, you can use join() to convert a list of strings to
a string. The following expression thus returns true
:
'foo' in join(' ',['there','is','food','here'])
The IN infix operator provides an alternate implementation for contains().
Function syntax:
int count_distinct_elements(list(T) L
)
Returns the number of unique elements in the argument list, L
, with any element type, T.
For example:
count_distinct_elements(list(1, 2, 3, 3, 4, 4, 5, 6, 6, 7))
returns 7.
See also the related function, unique() and the aggregate function count_distinct().
Function syntax:
list(int) count_list(list(T) LL(list(T)L1
, list(T)L2
, ...))
Accepts a list of lists as an argument, where each member list L1
, L2
, and so on of the containing list LL
must have the same element type or a coercible type. Returns a list containing an element-wise count of the corresponding
elements of each member list in LL. The list element type, T, of the returned list is the same as the element type of the
argument lists, or a coerced superset type. For example:
list_list( [ [1,2,3], [4,5,6,7,8] ] )
returns:
[2, 2, 2, 1, 1]
The length of component lists can vary. Short lists are padded with nulls to match the length of the longest list seen in the current context. Null values are ignored for the computation of the sum.
See also the count_list() function for aggregates.
Function syntax:
T dotproduct(list(T)L1
, list(T)L2
, ...)
Returns the sum of the element-wise product of its argument lists, whose element types must be int, double, or long. Accepts any number of lists, but is typically used with two list arguments. The data type of the returned value is the same as the element type of the argument lists. If the element type of the argument lists do not match, they are promoted using the rules in Data Type Coercion and Conversion.
This function produces the dot product, also known as the scalar product, of two or more numeric lists. The function multiplies the elements of each list in element order, L1.first times L2.first, then L1.second times L2.second, and so on. The resulting list of element products is summed and returned. If an argument list is null, the result is null. If any element in any argument list is null, the result is null. If an argument list is shorter than another argument list, the extra positions are filled in with 1 (or 1.0 or 1L), which preserves the extra position elements of the longer list unchanged.
Function syntax:
list(T) emptylist(T x
)
Returns an empty list with the same element type as the argument. The argument x
's value is ignored, and only its type is considered. This function is typically used like these examples:
emptylist(int())
emptylist(string())
You can also use an argument with a literal value, or an expression that resolves to a value, like the following examples. Only the type of the argument is considered:
emptylist(17)
emptylist("IBM")
Function syntax:
list(T) filterlist(inputlist(T), inFunction(arg T) {boolean function})
The filterlist
function takes a list as its first argument and a function as its second argument. The inputs to that function must be coercible
to its declared input types. If the input argument inputlist contains tuples, the number, order, and names of its fields must match exactly, and its field types must be coercible to
the field types of the input function's argument fields.
filterlist
returns a list composed of only the elements of the argument inputlist
for which the function provided in the second argument returns true
when applied to elements of inputlist
. The element type, T, of the returned list is the same as the element type of the argument inputlist
.
The input function inFunction must take in a single argument of the same type as the input list element type, and must output a boolean value. The input
function can be a built-in function or one defined using the expression language. Null values in the input list are not treated
specially and are included in the output list if the function returns true
for nulls.
Examples:
filterlist( [14,22,8,19,38,18,6,28,31,25,6], function myrange(x int) { x > 19 && x < 30} )
Returns:
[22, 28, 25]
In the next example, citytemps
is a named schema of daily temperature records with schema {city string, mintemp int, maxtemp int}. We want to filter from
this a list of tuples, each containing a city that had a temperature range of more than 20 degrees.
filterlist( list(citytemps("Phoenix", 77, 104), citytemps("Fargo", 58, 89), citytemps("Miami", 82, 96), citytemps("Seattle",
62, 73), citytemps("Boston", 54, 76)), function over20(rec citytemps) { rec.maxtemp - rec.mintemp > 20 } )
Returns:
[{"Phoenix", 77, 104}, {"Fargo", 58, 89}, {"Boston", 54, 76}]
Function syntax:
list(T) filternull(list(T) L
)
Returns a list composed of the elements of the argument list, L
, with any null elements removed. The element type, T, of the returned list is the same as the element type of the argument
list.
Function syntax:
T firstelement(list(T) L
)
Returns the first element of argument list L
. The data type, T, of the returned value is the same as the element type of the argument list.
The usage firstelement(L)
is the equivalent of L[0]
.
Function syntax (with line breaks added for clarity):
T foldleft(function(arg1 T, arg2 T, arg3 T, ... , argn T) {expression
}, arg1_initial, List<arg2> list1, List<arg3> list2, ... , List<argn> listn) T foldleft(functionname
(arg1 T, arg2 T, arg3 T, ... , argn T) {expression
}, arg1_initial, List<arg2> list1, List<arg3> list2, ... , List<argn> listn)
Executes {
to accumulate values on the elements of a set of input lists, working left to right, first to last, through the input list
elements, and returns the accumulated result. All the input lists must have the same number of elements, which must be numeric
types. Input lists that are null (expression
}nulllist(<type>)
) are treated as lists of nulls.
Although normally unnamed, the function can be given a name, as the second syntax above and the second example below illustrate.
The number of arguments to the input function must be one more than the number of input lists. The first input argument, arg1
, initializes the accumulator. Subsequent arguments must be coercible to arg1's data type, as must be the return data type.
To calculate the output, the function first sets the accumulator to the initial value of arg1
. The input function is applied on this accumulator value along with the first elements of the lists (list
) to produce a new accumulator value. The input function is then applied on this new accumulator value and the second element
of the lists (n
[0]list
) to calculate a new accumulator value, which are used with the third elements of the lists (n
[1]list
) and so on, until the end of the lists is reached. The accumulator value at this point is returned as the output.
n
[2]
Examples:
Sum the elements of one list, starting at 10.
foldleft(function (a int, b int) { a+b }, 10, list(1, 2, 3))
This evaluates to 16.
Construct a list of the maximum values of corresponding elements of three lists leftward (line breaks added for clarity):
foldleft(function maxOfLists(incoming list(int), a int, b int, c int)
{ append(incoming,(max(a, b, c))) },
emptylist(int()), list(88, 78, 65), list(45, 90, 23),
list(890, 89, 78) )
Evaluates to:
list(890, 90, 78)
using this sequence of operations:
maxOfLists([], 88, 45, 890) yields [890]
maxOfLists([890], 78, 90, 89) yields [890, 90]
maxOfLists([890, 90], 65, 23, 78) yields [890, 90, 78]
Function syntax (line break added for presentation clarity):
T foldright(function(arg1 T, arg2 T, arg3 T, ... , argn T) {expression
}, arg1_initial, List<arg2> list1, List<arg3> list2, ... , List<argn> listn) T foldright(functionname
(arg1 T, arg2 T, arg3 T, ... , argn T) {expression}, arg1_initial, List<arg2> list1, List<arg3> list2, ... , List<argn> listn)
Executes {
to accumulate values on the elements of a set of input lists, working right to left, last to first, through the input list
elements, and returns the accumulated result. All the input lists must have the same number of elements, which must be numeric
types. Input lists that are null (expression
}nulllist(<type>)
) are treated as lists of nulls.
Although normally anonymous, the function can be given a name, as the second syntax above and the second example below illustrate.
The number of arguments to the input function must be one more than the number of input lists. The first input argument, arg1
, initializes the accumulator. Subsequent arguments must be coercible to arg1's data type, as must be the return data type.
To calculate the output, the function must set the first accumulator value to initial value arg1
. The input function is applied on this accumulator value along with the last elements of the lists (list
) to produce a new accumulator value. The input function is then applied on this new accumulator value and the second-from-last
element of the lists (n
[n-1]list
) to calculate a new accumulator value, which is used with the third-from-last elements of the lists (n
[n-2]list
) and so on, until the beginning of the lists is reached. The accumulator value at this point is returned as the output.
n
[n-3]
Examples:
Sum the elements of two lists:
foldright(function (a int, b list(int), c list(int)) { a+b+c }, 0, list(1, 2, 3), list(4, 5, 6))
This evaluates to 21.
Construct a list of the maximum values of corresponding elements of three lists rightward (with line breaks added for clarity):
foldright(function maxOfLists(incoming list(int), a list(int),
b list(int), c list(int)) { append(incoming,(max(a, b, c))) },
emptylist(int()), list(88, 78, 65), list(45, 90, 23),
list(890, 89, 78) )
Evaluates to:
list(78, 90, 890)
Using this sequence of operations:
maxOfLists([], 65, 23, 78) yields [78]
maxOfLists([78], 78, 90, 89) yields [78, 90]
maxOfLists([78, 90], 88, 45, 890) yields [78, 90, 890]
Function syntax:
bool has_intersection(list(T)L1
, list(T)L2
)
Returns true if lists L1 and L2, which must have the same type, contain any values in common and false if they do not. The input lists can be of different lengths. For example:
has_intersection([2,4,6,8,10],[1,3,5])
returns false, but
has_intersection([0,],[1,3,5,0])
returns true.
See also the intersect() function, which returns the set of values that two lists have in common.
Function syntax:
int indexof(list(T)L
, Te
[, intstart
])
Returns the index of the first element of the list L
whose value is e
, starting at index start
. The argument list can have any element type, T
. The element e
must have the same element type as the argument list. For list arguments, if e
is not found, the function returns a value of null
.
If the search is successful, the value returned is always greater than or equal to start
. If start
is unspecified, it is taken to be 0, which designates the first character of the string or the first element of the list.
See also the indexof() function that operates on strings.
Function syntax:
list(T) insertelement(list(T)L
, intindex
, Te
)
Returns a list composed of the elements of the argument list, L
, with element e
inserted before index position index
in the argument list. index
is zero-based. The element type, T, of the returned list is the same as the element type of the argument list and of the
inserted element, e
.
The usage insertelement(
is the equivalent of append(). The usage L
, length(L
), e
)insertelement(
is the equivalent of prepend().
L
, 0, e
)
Function syntax:
list<T> intersect(list(T)L1
, list(T)L2
)
Returns a list containing the values common to lists L1 and L2, which must have the same type, or an empty list if they share no values in common. The input lists can be of different lengths. For example:
intersect([2,4,6,8,10,12,14,16,18,20],[3,6,9,12,15,18])
returns:
[6,12,18]
See also the has_intersection() function, which determines whether or not two lists have common values.
Function syntax:
string join(list(T)L
) string joinlist(list(T)L
) string join(stringseparator
, list(T)L
) string joinlist(stringseparator
, list(T)L
)
Returns a string composed of the elements of list L
, having any element type. With a single argument, elements are concatenated. With two arguments, the separator
character is inserted between each element. If the list
argument is empty, join() returns an empty string. If list
is null, join() returns null. If the separator
string is null, join() returns null.
You can use joinlist()
as an alias for join()
in all contexts. In StreamSQL contexts, which has the reserved word JOIN, and in LiveView LiveQL, which borrows its parser
from StreamSQL, you must either use the joinlist()
alias, or use escaped identifier syntax.
Also see the aggregate function join().
Function syntax:
T lastelement(list(T) L
)
Returns the last element of argument list L
. The data type of the returned value is the same as the element type of the argument list.
The usage lastelement(L)
is the equivalent of L[-1]
.
Function syntax:
int lastindexof(list(T)L
, Te
[, intlastStart
])
Returns the index of the last element of the list L
whose value is e
, for which the index is less than or equal to lastStart
. The argument list can have any element type, T. The element e
must have the same element type as the argument list. For list arguments, if e
is not found, the function returns a value of null
.
If the search is successful, the value returned is always less than or equal to lastStart
. If lastStart
is unspecified, it is taken to be the index of the last grapheme of the string or the last element of the list.
See also the lastindexof() function that operates on strings.
Function syntax:
int length(list L
)
Returns the number of elements in a list, including any null elements. If the argument is a null list, the returned value
is null
.
See also the length() functions that operate on lists and blobs.
Returns a list containing its arguments as elements. See list() in the Simple Functions: Type Conversions section.
Function syntax:
list(T) maplist(function(arg1 T, arg2 T, ... , argn T)
{function-body
}, list1 T, list2 T, ... , listn T)
The maplist
function maps the elements of one or more lists onto each other, according to a function() declared as the first argument.
The maplist() function takes a declared function() as its first argument and one or more lists as additional arguments. It
applies the function to the elements of each argument list, then returns a single list that contains the results of applying
the function on each element position, in element order.
To take a simple example, the following maplist() instance declares a function that has two integer arguments and a function body that adds the arguments. When applied to the input list(int) lists, the result is that the first element of the first argument list is added to the first element of the second argument list. Then the second element of the first list is added to the second element of the second list, and so on.
maplist(function(x int, y int) { x + y }, list(1, 2, 3), list(4, 5, 6)))
The result is:
(list(int) [5, 7, 9]
The input function can be defined using the expression language or a built-in function. The number of arguments to the declared
function must be at least equal to the number of input argument lists. If the argument lists have more elements than the number
of function arguments, the function-body
expression continues to be applied.
The inputs to the function — that is, the elements of the input lists — must be coercible to the declared data type of the function arguments. If an input argument is a tuple, the number, order, and names of the fields must match exactly, and the field types of the input arguments must be coercible to the field types of the input function's argument fields.
The element type of the returned list is the same as the element type of the input lists or one that is coercible from them.
Input lists with null elements are valid lists. The result element corresponding to a null element's position can be a null,
depending on the operation declared in the function-body.
The declared function must handle any nulls; the maplist() function as a whole does not do so. For example:
maplist(function(x int, y int) { x + y }, list(1, 2, 3), list(4, null, 6)))
returns:
(list(int) [5, null, 9]
To take another example:
maplist(function(n1 int, n2 int, n3 int) {n1+n2+n3}, [1,2,3,4], [2,4,6,8], [3,6,9,12])
returns:
(list(int)) [6, 12, 18, 24]
Input lists that are null (nulllist(<type>)
) are treated as lists of nulls.
Function syntax:
list(T) maplist_loose(function(T1, T2, ... , Tn){...}, list<T1>, list<T2>, ... , list<Tn>)
The maplist_loose() function takes the same function first argument and one or more lists as additional arguments as the maplist() function, and returns a list the same as maplist(). The difference is that this function allows lists of any length, padding shorter lists with nulls to the same length as the longest list encountered.
See maplist() for instructions on using this function.
Function syntax:
T maxelement(list(T) L
)
This function returns the maximum non-null element value using the greater-than relational operator across all elements in
the argument list, L
. The data type of the returned value is the same as the element type of the argument list.
StreamBase data types are comparable with relational operators in different ways, as listed in the entry for each data type on StreamBase Data Types.
See also the simple version of max(), the aggregate version of max(), and the aggregate maxn() function. Use the simple function max(
to determine the larger of two or more lists. In an 2, list
, , list
)max(
returns the maximum list in the aggregate window. By contrast, use list
)maxelement()
to compare the element values of a single list.
Function syntax:
double median(list(double)L
) double median(list(int)L
) double median(list(long)L
)
Given a list, L,
of doubles, ints, or longs, returns a double, the median of the elements of L
. If the list has an even number of elements, returns the average of the middle two.
See also the aggregate version of median().
Function syntax:
list(T) mergelist(list<list<T>>listOfLists
) list(T) mergelist(list<list<T>>listOfLists
, boolascending
) list(T) mergelist(list<list<T>>listOfLists
, function(arg1 T, arg2 T) {intcompareFunction
}) list(T) mergelist(list<list<T>>listOfLists
, boolascending
, function(arg1
T,arg2
T) {intcompareFunction
})
The mergelist
function merges a list of sorted input lists into a single sorted output list. It takes one, two, or three arguments:
-
The single argument form takes a required list of lists, which is assumed to be presorted in the desired order.
mergelist
does not sort its input lists. -
The optional second argument is a boolean value indicating that the output is to be sorted: true for ascending or false for descending.
-
The third argument is an optional function that compares the values of two inputs (see below).
If a boolean argument is used, it must be the second argument. If absent, the output list is sorted ascending.
When the input lists consist of simple types, default StreamBase expression language comparison operators are normally adequate to obtain properly sorted output.
When the input list elements are tuples, you can provide a compare function if you need to sort on a particular field or fields
of input tuples. The compare function can be a built-in function or declared using the expression language. It must accept
two input arguments and output an integer. mergelist
expects the function's return value to be negative if the first argument is less than the second argument, positive if the
first argument is greater than the second argument, and 0 if they are equal. The inner list type must be coercible to the
function argument types.
For example, by default a list of lists of tuples with a schema {price int, volume int, symbol string}, would sort on the
first field (price). When prices are equal, the second field (volume) is used to resolve any ties, and then the third field
(symbol), as needed. If, however, you want to sort only on symbol, you can provide a function to do so, such as the following,
where mytuple
is the name of a named schema used as a data type:
function symbolCompare(t1 mytuple, t2 mytuple)
{
if (t1.symbol > t2.symbol) then 1
else if (t1.symbol < t2.symbol) then -1
else 0
}
If the input list of lists contains only one inner list, the output list's elements are the contents of that inner list in the same order. When null values exist in input lists, how they sort in the output list (like other values) is determined by the compare function. For more information see Null Literals.
A call to mergelist
that uses the above compare function to sort an enqueued list of lists might take this form (line breaks were added for clarity):
mergelist( list(list()) input.trades,
function symbolCompare(t1 mytuple, t2 mytuple)
{ if (t1.symbol > t2.symbol) then 1 else if
(t1.symbol < t2.symbol) then -1 else 0 } )
Function syntax:
T minelement(list(T) L
)
This function returns the minimum non-null value using the less-than relational operator across all elements in the argument
list, L
. The data type of the returned value is the same as the element type of the argument list.
StreamBase data types are comparable with relational operators in different ways, as listed in the entry for each data type on StreamBase Data Types.
See also the simple version of min(), the aggregate version of min(), and the aggregate minn() function. Use the simple function min(
to determine the smaller of two lists. In an aggregate context, list
, , list
)min(
returns the minimum list in the aggregate window. By contrast, use list
)minelement()
to compare the element values of a single list.
Function syntax:
list(T) nulllist(T x
)
Notice that the name of the function has three lowercase letter L's. Returns a null list with the same element type as the
argument. The argument x
's value is ignored, and only its type is considered. This function is typically used like these examples:
nulllist(double())
nulllist(timestamp())
You can also use an argument with a value, or an expression that resolves to a value, like the following examples. Only the type of the argument is considered:
nulllist(17)
nulllist("IBM")
Function syntax:
list(T) prepend(list(T)L
, Te
)
Returns a list containing element e
at the start, then all of list L
. The data type of element e
and the element type of list L
must be the same, except that argument types int and long can be coerced to long and double following the rules in Data Type Coercion and Conversion. The element type of the returned list is the same as the element type of the argument list and of the prepended element,
e
.
Function syntax:
double product(list(double)L
) int product(list(int)L
) long product(list(long)L
)
Given a list, L
, of doubles, ints, or longs, returns the product of each member of the list. The data type of the returned value is the same
as the element type of the argument list.
See also the aggregate version of product().
Function syntax:
list(int) range(intstart
, intend
[, intstep
])
Returns a list of ints from value start
up to (or down to) but not including value end
. That is, the range includes the first item in the list and excludes the last item. Thus, range(1, 10)
returns a list with nine elements, not ten:
list(int) [1, 2, 3, 4, 5, 6, 7, 8, 9]
The default value for step
is 1. step
cannot be 0, but you can specify a step
value greater than 1 to produce a list with skipped elements. For example, a step
of 2 returns every other number:
range(0, 8, 2)
returns
list(int) [0, 2, 4, 6]
If you specify a negative value for step
, the list generates in reverse. For example:
range(3, 0, -1)
returns
list(int) [3, 2, 1]
while
range(6, 1, -2)
returns
list(int) [6, 4, 2]
When step
is negative and end
is greater than start
, the result is an empty list. Likewise, an empty list also results when start
is greater than end
and step
is positive.
Parses a string into tokens delimited by a character matching a regular expression, and returns a list of strings with the tokens as elements. See regexsplit() in the Simple Functions: Strings section.
Function syntax:
list(T) removeelement(list(T)L
, intindex
)
Returns a list composed of the elements of the argument list, L
, with the element at index position index
removed. The element type, T, of the returned list is the same as the element type of the argument list.
Function syntax:
list(T) replaceelement(list(T)L
, intindex
, Te
) list(T) replaceelement(list(T)L
,updatefunction
(arg T, index int)->T) list(T) replaceelement(list(T)L
,predicatefunction
(arg T)-> bool, replacement T)
Returns a list composed of the elements of the argument list, L
, with the element at index position index
replaced by element e
. The element type of the returned list is the same as the element type of the argument list.
In the second form, instead of specifying an index into the list of interest, specify an updatefunction
to be called with every element of the list, which function can return either the current value or some new value determined
at runtime.
In the third form, specify a predicatefunction
to be called with every element of the list, which returns replacement
for all values that match the predicate function.
Function syntax:
list(T) reverse(list(T) L
)
Returns its argument list, L
, in reverse order. The element type, T, of the returned list is the same as the element type of the argument list.
Function syntax:
list(T) shuffle(list(T) L
)
Returns its argument list, L
, with its elements shuffled in random order. Calling shuffle() again on the same list returns a different sort order. The
element type, T, of the returned list is the same as the element type of the argument list.
Function syntax:
list(T) sort(list(T)L
[, boolascending
]) list(T) sort(list(T)L
, functioncompareFunction
(arg 1 T, arg2 T)->boolean) list(T) sort(list(T)L
, bool, functioncompareFunction
(arg1 T, arg2 T)->boolean)
Returns its argument list, L
, sorted. Sorting occurs in ascending order if the optional ascending
argument is true
or omitted, or in descending order if ascending
is false
. If the ascending
argument is used, it must be the second argument. The element type, T, of the returned list is the same as the element type
of the argument list.
StreamBase data types are comparable with relational operators in different ways, as listed in the entry for each data type on StreamBase Data Types. Null values sort as the lowest possible values for the element type of the list. For doubles, NaN sorts as the highest possible value, which yields the following relations: Null < –Infinity < 0 < +Infinity < NaN.
To exercise control over sorting, especially when list elements are tuples, you can provide your own comparison function that compares two input values and outputs an integer value. The output value must be negative if the first argument is less than the second argument, positive if the first argument is greater than the second argument, and 0 if they are equal.
For further details and an example of using a comparison function, see the description of the mergelist function.
Parses a string into tokens delimited by a specified character and returns a list of strings with the tokens as elements. See split() in the Simple Functions: Strings section.
Function syntax:
double stdev(list(double)L
) double stdev(list(int)L
) double stdev(list(long)L
) timestamp stdev(list(timestamp)L
)
Given a list, L,
of doubles, ints, longs, or timestamps, returns the standard deviation for all members of the list. If the argument list
is a list of doubles, ints, or longs, returns a double. If the argument list is a list of timestamps, returns an interval
timestamp. For timestamp values, the list should contain all interval timestamps or all absolute timestamps, but not both.
For lists of length 1, returns null.
See also the aggregate version of stdev().
Function syntax:
double stdevp(list(double)L
) double stdevp(list(int)L
) double stdevp(list(long)L
) timestamp stdevp(list(timestamp)L
)
Given a list, L,
of doubles, ints, longs, or timestamps, returns the standard deviation for all members of the list. If the argument list
is a list of doubles, ints, or longs, returns a double. If the argument list is a list of timestamps, returns an interval
timestamp. For timestamp values, the list should contain all interval timestamps or all absolute timestamps, but not both.
For lists of length 1, returns 0.
With stdevp()
the data provided is the entire population, while with stdev()
, the data provided is treated as a random sample. The stdevp()
function is calculated using the biased (or n) method. The stdev()
function is calculated using the unbiased (or n-1) method.
See also the aggregate version of stdevp().
Function syntax:
list(T) sublist(list(T)L
, intstart
[, intend
[, intstep
]])
Returns a list composed of a section of the argument list, L
, starting with index start
, which is included in the result list. If end
is specified, it is not included in the result list. The default value for step
is 1. You can supply an alternate step
value to skip elements in the argument list. For example, a step
of 2 returns every other value in the list. You must specify an end
value in order to specify a step
value. The element type, T, of the returned list is the same as the element type of the argument list.
In contrast with the range() function, step
cannot be negative.
Function syntax:
double sum(list(double)L
) int sum(list(int)L
) long sum(list(long)L
) timestamp sum(list(timestamp)L
)
Given a list, L,
of doubles, ints, longs, or timestamps, returns the sum of all members of the list. The element type of the returned list
is the same as the element type of the argument list.
When the list element type is timestamp, summing elements of the list follows the rules for adding timestamps as shown in the table in timestamp Data Type. That is, you cannot sum a list of two or more absolute timestamps. However, you can sum a list composed of all interval timestamps, or one composed of exactly one absolute timestamp plus one or more interval timestamps.
See also the aggregate version of sum().
Function syntax:
list(T) sum_list(list(T) LL(list(T)L1
, list(T)L2
, ...))
Accepts a list of lists as an argument, where each member list L1
, L2
, and so on of the containing list LL
must have the same element type or a coercible type. Returns a list containing the element-wise sum of the elements of each
member list in LL. The list element type, T, of the returned list is the same as the element type of the argument lists, or
a coerced superset type. For example:
sum_list( [ [1,2,3], [4,5,6] ] )
returns:
[5, 7, 9]
The length of component lists can vary. Short lists are padded with nulls to match the length of the longest list seen in the current context. Null values are ignored for the computation of the sum.
See also the sum_list() function for aggregates.
Function syntax:
list(T) unique(list(T) L
)
Returns a list composed of the unique elements in the argument list, L
, in ascending (lexicographical) order. The element type, T, of the returned list is the same as the element type of the argument
list.
For example:
unique([4, 8, 3, 7, 1, 3, 4, 4, 5, 6, 2, 6, 7])
returns:
list(int) [1, 2, 3, 4, 5, 6, 7, 8]
Be aware that lexicographic ordering of strings differs from numeric types. For example, with ints,
unique([400, 8, 3, 7, 1, 3, 4, 5, 400, 6, 6, 7])
returns:
(list(int)) [1, 3, 4, 5, 6, 7, 8, 400]
but given string tokens,
unique(['400', '8', '3', '7', '1', '3', '4', '5', '400', '6', '6', '7'])
returns a list in "dictionary" order:
(list(string)) [1, 3, 4, 400, 5, 6, 7, 8]
To find unique list elements and return them in descending order, use reverse(unique(<list>))
.
See also the related function, count_distinct_elements().
Function syntax:
tuple unzip(list(tuple) listOfTuples)
Takes a list of tuples and returns a tuple of lists.
An example will clarify the function. Let's say your application extracts a list of symbol and price tuples from a stream. One list might contain the following:
[ {prices: 102.51, symbols: AAPL}, {prices: 96.52, symbols: IBM}, {prices: 36.33, symbols: HPQ} ]
You can run this list through unzip()
, using an expression like the following:
unzip(list(tuple(102.51 as prices,'AAPL' as symbols),
tuple(96.82 as prices,'IBM' as symbols),
tuple(36.33 as prices,'HPQ' as symbols)))
The returned value is a single tuple whose fields are lists, as follows:
((prices list(double), symbols list(string))) "[102.51,96.82,36.33]","[AAPL,IBM,HPQ]"
Function syntax:
double variance(list(double)L
) double variance(list(int)L
) double variance(list(long)L
)
Given a list, L,
of doubles, ints, or longs, returns a double, the variance for all members of the list. Variance is a measure of the dispersion
of a set of data points around their mean value.
For lists of length 1, returns null. See the aggregate version of variance() for more on variance.
Function syntax:
double variancep(list(double)L
) double variancep(list(int)L
) double variancep(list(long)L
)
Given a list, L,
of doubles, ints, or longs, returns a double, the variance for all members of the list. Variance is a measure of the dispersion
of a set of data points around their mean value.
The variancep()
function is similar to the variance()
function: with variancep()
the data provided is the entire population, while with variance()
, the data provided is treated as a random sample. The variancep()
function is calculated using the biased (or n) method. The variance()
function is calculated using the unbiased (or n-1) method.
For lists of length 1, returns 0. See also the aggregate version of variancep() for more on variance.
Function syntax:
list(tuple) zip(tuple(list(T1), ..., list(Tn)) tupleOfLists)
In the spirit of the zip function in Python, zip()
takes a tuple of lists, collates it, and returns it as a list of tuples, where the n
th tuple contains the n
th element from each of the argument lists. The element types of the lists in the argument tuple can be of any type, and do
not need to be the same.
An example will clarify the function. Let's say you have a stream whose schema accepts lists of securities price data: { prices list(double), symbols list(string) }
. One tuple might contain the following:
{ prices: [102.51, 96.82, 36.33], symbols: [AAPL, IBM, HPQ] }
You can run this tuple through zip()
, using an expression like the following:
zip(tuple(list(102.51, 96.82, 36.33) as prices, list('AAPL', 'IBM', 'HPQ') as symbols))
The returned value is a list of tuples:
(list((prices double, symbols string))) [102.51,AAPL, 96.82,IBM, 36.33,HPQ]
This category includes the following functions:
The following functions provide bitwise operations:
The following functions return the maximum and minimum values for numeric data types:
Function syntax:
int abs(inte
) double abs(doublee
) long abs(longe
)
Returns the absolute value of an int, double, or long expression e
. The return type is the same as the expression's type.
Function syntax:
double acos(intx
) double acos(doublex
) double acos(longx
)
Returns in radians the arc cosine of x
, which is the value whose cosine is x
. Undefined outside of the range –1 to 1.
Function syntax:
bool andall(boolb1
, boolb
2, ...)
Takes two or more values of type boolean, or expressions that resolve to type boolean, and returns the results of a logical
AND operation on all arguments. For example, andall(true, true, false, true, false)
returns false
. Use andall() as a truth detector function.
Null arguments have special handling. A null
argument does not change the result of the evaluation if a false
value is among the arguments: andall(null, true, false, null, true)
still returns false
. However, if one or more arguments is null while all other arguments are true
, the function returns null
: andall(null, true, true, null, true)
returns null
.
The andall() function follows the logic expressed in this statement:
if any argument is false, return false
else if any argument is null, return null
else return true
See also the orall() simple function and the aggregate versions of andall() and orall().
Function syntax:
double asin(int x
)
double asin(double x)
double asin(long x)
Returns in radians the arc sine of x
, which is the value whose sine is x. Undefined outside of the range –1 to 1.
Function syntax:
double atan(intx
) double atan(doublex
) double atan(longx
)
Returns in radians the arc tangent of x
, which is the value whose tangent is x
.
Function syntax:
double atan2(intx
, inty
) double atan2(doublex
, doubley
) double atan2(longx
, longy
)
Returns in radians the arc tangent of two numeric variables, x
and y
, either of which can be int, long, or double. This is similar to calculating the arc tangent of (y/x
), except that the signs of both arguments are used to determine the quadrant of the result tangent is x
.
int bitand(intx
, inty
) long bitand(longx
, longy
)
Returns the result of a bitwise AND operation on values x
and y
. Compares each bit in x
to the corresponding bit in y
. If both bits are 1, sets the corresponding result bit to 1. Thus, bitand(3,6)
returns 2. (0011 AND 0110 returns 0010)
int bitnot(intx
) long bitnot(longx
)
Returns the result of a bitwise NOT operation on value x
. Compares each bit in x
and sets its bitwise complement in the corresponding result bit, switching 0 for 1 and 1 for 0. Thus, bitnot(5)
returns -6. (0...0101 returns 1...1010)
int bitor(intx
, inty
) long bitor(longx
, longy
)
Returns the result of a bitwise inclusive OR operation on values x
and y
. Compares each bit in x
to the corresponding bit in y
. If either bit is 1 or both bits are 1, sets the corresponding result bit to 1; otherwise, sets the corresponding result
bit to 0. Thus, bitor(12,7)
returns 15. (1100 OR 0111 returns 1111)
int bitxor(intx
, inty
) long bitxor(longx
, longy
)
Returns the result of a bitwise XOR (exclusive OR) operation on values x
and y
. Compares each bit in x
to the corresponding bit in y
. If either bit is 1 but not both bits, sets the corresponding result bit to 1; otherwise, sets the corresponding result bit
to 0. Thus, bitxor(12,7)
returns 11. (1100 OR 0111 returns 1011)
Function syntax:
double cbrt(intx
) double cbrt(doublex
) double cbrt(longx
)
Returns the (real) cube root of x
.
Function syntax:
double ceil(intx
) double ceil(doublex
) double ceil(longx
)
Returns the smallest double value that is greater than or equal to the argument and is equal to a mathematical integer. The
function name, ceil
, is shorthand for ceiling. The argument can be an int, double or long. Examples:
-
ceil(7.01)
returns8.0
. -
ceil(8.0)
returns8.0
. -
ceil(8)
returns8.0
. -
ceil(8L)
returns8.0
.
See also the related function, floor().
Function syntax:
double choose(intn
, intk
)
Returns the n choose k binomial coefficient, which represents the number of ways to choose k
unordered elements from a set of n
elements. Specify n
as a positive nonzero integer and k
as a power of n
, where k
is greater than or equal to 1 and less than or equal to n
–1. If n
< 0 or k
> n, 0.0 is returned.
Compare the permute() function.
Function syntax:
double cos(intx
) double cos(doublex
) double cos(longx
)
Returns the cosine of x
, where x
is given in radians.
Function syntax:
double cosh(intx
) double cosh(doublex
) double cosh(longx
)
Returns the hyperbolic cosine of its argument.
Function syntax:
double exp(intx
) double exp(doublex
) double exp(longx
)
Returns the value of e (the base of natural logarithms) raised to the power of x
. The argument x
is a small value of type int, double, or long.
Function syntax:
double expm1(intx
) double expm1(doublex
) double expm1(longx
)
Returns (e^x) -1. The argument x
is a small value of type int, double, or long.
Function syntax:
double factorial(int n
)
Returns the factorial of the provided integer between 0 and 170. Arguments below zero return NaN; arguments above 170 exceed the size of a double and return Infinity.
Function syntax:
double floor(intx
) double floor(doublex
) double floor(longx
)
Returns the largest double value that is less than or equal to the argument x
, and equal to a mathematical integer. The argument can be an int, double, or long. Examples:
-
floor(7.01)
returns7.0
. -
floor(8.0)
returns8.0
. -
floor(8)
returns8.0
. -
floor(8L)
returns8.0
.
See also the related function, ceil().
Function syntax:
double gaussianrandom(doublemean
, doublestdev
)
Takes two arguments of type double, the mean and standard deviation of a Gaussian distribution, and returns a double randomly chosen from that distribution.
Gaussian distributions are also known as normal distributions (in statistics) or bell curves (in testing). The function is useful for simulating data from any population that tends to follow a normal distribution.
For example, to simulate the SAT score for a random student, we could call gaussianrandom(1500, 100), where 1500 and 100 might be the national mean and standard deviation respectively.
Function syntax:
double ln(intx
) double ln(doublex
) double ln(longx
)
Returns the natural logarithm of x
.
Function syntax:
double log10(intx
) double log10(doublex
) double log10(longx
)
Returns the base-10 logarithm of x
.
Function syntax:
double log1p(intx
) double log1p(doublex
) double log1p(longx
)
Returns the natural logarithm of the sum of the argument and 1.
int lshift(intx
, inty
) long lshift(longx
, longy
)
Returns the result of a bitwise left shift of the bits in x
by the number of bit positions y
. Thus, lshift(5,1)
returns 10. (0101 returns 1010)
Function syntax:
T max(Te1
, ..., Ten
)
This function returns the maximum non-null value using the greater-than relational operator across all its arguments e
1 through en
, which must all be expressions resolving to the same data type. The data type, T, of the returned value is the same as the
arguments.
StreamBase data types are comparable with relational operators in different ways, as listed in the entry for each data type on StreamBase Data Types.
See also the aggregate version of max(), the maxelement() function for lists, and the aggregate maxn() function. Use this simple function max(
to determine the larger of two or more lists. In an aggregate context, list
, , list
)max(
returns the maximum list in the aggregate window. By contrast, use list
)maxelement()
to compare the element values of a single list.
Function syntax:
T min(Te1
, ..., Ten
)
This function returns the minimum non-null value using the less-than relational operator across all its arguments e
1 through en
, which must all be expressions resolving to the same data type. The data type of the returned value is the same as the arguments.
StreamBase data types are comparable with relational operators in different ways, as listed in the entry for each data type on StreamBase Data Types.
See also the aggregate version of min(), the minelement() function for lists, and the aggregate minn() function. Use this simple function min(
to determine the smaller of two or more lists. In an aggregate context, list
, , list
)min(
returns the minimum list in the aggregate window. By contrast, use list
)minelement()
to compare the element values of a single list.
Function syntax:
bool orall(boolb1
, boolb2
, ...)
Takes two or more values of type boolean, or expressions that resolve to type boolean, and returns the results of a logical
OR operation on all arguments. For example, orall(true, true, false, true, false)
returns true
. Use orall() as a falsity detector function.
Null arguments have special handling. A null argument does not change the result of the evaluation if a true
value is among the arguments: orall(null, false, true, null, false)
still returns true
. However, if one or more arguments is null
while all other arguments are false
, the function returns null
: orall(null, false, false, null, false)
returns null
.
The orall() function follows the logic expressed in this statement:
if any argument is true, return true
else if any argument is null, return null
else return false
See also the andall() simple function and the aggregate versions of andall() and orall().
Function syntax:
double permute(intn
, intk
)
Using permutation in the combinatorics sense, this function returns the number of ordered arrangements of k
distinct elements that can be select from a set of n
elements. Specify n
as a positive nonzero integer and k
as a number of elements of n
, where k
is between zero and n
. If n
< 0 or k
> n, 0.0 is returned.
Compare the choose() function.
Function syntax:
double pow(intx
, inty
) double pow(doublex
, doubley
) double pow(longx
, longy
)
Returns the value of x
raised to the power of y
. Accepts an int, double, or long, and returns a double.
Function syntax:
double random()
int random(int maxrange
)
Without arguments, returns a pseudo-random double value with positive sign, greater than or equal to 0.0 and less than 1.0.
With an integer argument maxrange
, returns a pseudo-random integer with positive sign, greater than or equal to 0 and less than maxrange
. Use this form to pick a random number between 0 and maxrange
—1, inclusive. For example, to simulate rolling a six-sided die, the expression maxrange(6) + 1
returns a random integer between 1 and 6 inclusive.
Function syntax:
long round(doublee
) long round(inte
, intn
) long round(doublee
, intn
) long round(longe
, intn
)
With one argument, a double, returns the closest long to the argument. Floating-point numbers are NOT supported.
With two arguments, round() is similar to the round() found in Microsoft Excel, but not identical to it. Specify a double, long, or int in the first argument. Specify an int in the second argument to designate the number of digits to round the first argument to. Use 0 to specify rounding to an integer, a positive integer to round the digits after the decimal point, or a negative number to specify rounding digits before the decimal point. If the first argument is an int or long, the second argument must be negative to actually perform rounding; otherwise the first argument is returned unchanged.
int rshift(intx
, inty
) long rshift(longx
, longy
)
Returns the result of a logical bitwise right shift of the bits in x
by the number of bit positions y
. This is the equivalent of the Java >>
operator. Thus, rshift(5,1)
returns 2. (0101 returns 0010)
Function syntax:
double securerandom()
A more secure form of random() that generates random bytes, converted to return a double.
Function syntax:
int sign(intx
) int sign(doublex
) int sign(longx
)
Returns the sign of its argument: –1 if less than zero, 0 if equal to zero, and 1 if greater than zero.
Function syntax:
double sin(intx
) double sin(doublex
) double sin(longx
)
Returns the sine of x
, where x
is given in radians.
Function syntax:
double sinh(intx
) double sinh(doublex
) double sinh(longx
)
Returns the hyperbolic sine of its argument.
Function syntax:
double sqrt(intx
) double sqrt(doublex
) double sqrt(longx
)
Returns the nonnegative square root of x
.
Function syntax:
double tan(intx
) double tan(doublex
) double tan(longx
)
Returns the tangent of x
, where x
is given in radians.
Function syntax:
double tanh(intx
) double tanh(doublex
) double tanh(longx
)
Returns the hyperbolic tangent of a double value.
Function syntax:
double to_degrees(double x
)
Converts an angle measured in radians to an approximately equivalent angle measured in degrees.
Function syntax:
double to_radians(double x
)
Converts an angle measured in degrees to an approximately equivalent angle measured in radians.
int unsignedrshift(intx
, inty
) long unsignedrshift(longx
, longy
)
Returns the result of a bitwise right shift of the bits in x
by the number of bit positions y
, filling the leftmost bit position with 0. This is the equivalent of the Java >>>
operator. Thus, rshift(-1,1)
returns -1 (1111 1111 1111 1111), but unsignedrshift(-1,1)
returns 2147483647 (0111 1111 1111 1111).
bool xorall(boola
, boolb
, ...)
Takes two or more values of type boolean, or expressions that resolve to type boolean, and returns the results of a logical XOR (exclusive OR) operation on all arguments.
If any argument is null, the function returns null. Otherwise, it returns true iff an odd number of arguments are true and returns false iff an even number of arguments are true. Conceptually, xorall evaluates successive arguments pairwise to produce its result.
xorall(true, true)
returns false
.
xorall(true, false)
returns true
.
xorall(false, false)
returns false
.
xorall(true, null)
returns null
.
xorall(true, true, true)
returns true
, but
xorall(true, true, true, true)
returns false
.
The xorall function follows the logic expressed in this statement:
if all arguments are false, return false
else if an odd number of arguments are true, return true
else return false
See also the andall() and orall() simple functions and the aggregate versions of andall() and orall().
This category includes the following functions:
The functions in this group detect NaN (not a number) values.
Function syntax:
bool isnan(doublevalue
) bool isnan(intvalue
) bool isnan(longvalue
) bool isnan(timestampvalue
)
Returns true
if the argument is NaN (not a number), or returns null
if the argument is null. Returns false
otherwise.
Function syntax:
bool notnan(doublevalue
) bool notnan(intvalue
) bool notnan(longvalue
) bool notnan(timestampvalue
)
Returns false
if the argument is NaN (not a number), or returns null
if the argument is null. Returns true
otherwise.
Notice that notnan()
returns the opposite of isnan()
, except if the argument is null, in which case they both return null.
This category includes the following functions:
Function syntax:
list(string) getAllHostIPs()
Returns as a list(string) all the non-loopback IP addresses assigned to the machine on which StreamBase Server is running.
Function syntax:
string getClientIP(string connectionID
)
Returns as a string the IP address associated with the specified connectionID
. Obtain the connectionID
string from the first field returned by an sbadmin listConnections command, or from the connection
stream in the system
container.
Use this function to determine by IP address the actual computer connecting to, or recently disconnected from, a running instance of StreamBase Server.
This function is not expected to work at the command prompt with sbd --eval.
Function syntax:
string getCompilerProperty(string sysprop
)
Returns the value of the specified system property that StreamBase Server is actually using. By contrast, getSetCompilerProperty() answers the question of whether this value was explicitly set, and systemproperty() only returns values explicitly set with -Dsome.system.property=value
.
Compare the following:
sbd -e "systemproperty('streambase.querytable.default-memory-tables')
(string) null
sbd -e "getSetCompilerProperty('streambase.querytable.default-memory-tables')
(string) UNSET
sbd -e "getCompilerProperty('streambase.querytable.default-memory-tables')
(string) MEMORY
These results tell you that the streambase.querytable.default-memory-tables
system property was not explicitly set at the command line with -D
, was not explicitly set in a property table and is therefore using the compiled-in default value, and that this default value
is MEMORY.
Function syntax:
string get_conf_param(default-value
,path1
,path2
,path3
, ...) boolean get_boolean_conf_param(default-value
,path1
,path2
,path3
, ...) int get_int_conf_param(default-value
,path1
,path2
,path3
, ...) long get_long_conf_param(default-value
,path1
,path2
,path3
, ...) double get_double_conf_param(default-value
,path1
,path2
,path3
, ...)
Returns the value portion of a name-value pair specified with a <param>
element in the currently active server configuration file, usually named sbd.sbconf
. Use the typed versions of the function to convert the returned value to the specified data type.
These functions only return values from <param>
elements. Other configuration file elements cannot be accessed with these functions.
The first argument, default-value
, specifies a value to return if parsing the configuration file returns nothing. The arguments path1
, path2
... pathn
specify element and attribute names in the XML path to the value of interest (without counting the <streambase-configuration>
top-level element).
For example, the value of the TCP port might be set in the configuration file with an entry like the following:
<server>
<param name="tcp-port" value="9900" />
</server>
To extract the TCP port value from the configuration file, use the int version of the get_conf_param()
function as follows:
get_int_conf_param(10000, "server", "tcp-port")
In another example, some of the values set in the configuration file's <security>
element are nested deeper:
<security>
<ssl-authentication>
<param name="keystore" value="../security/signed.keystore"/>
<param name="keystore-password" value="secret"/>
<param name="key-password" value="secret1"/>
</ssl-authentication>
</security>
To extract the keystore-password value, use the following function:
get_conf_param("verysecret", "security", "ssl-authentication", "keystore-password")
These functions are not expected to work at the command prompt with sbd --eval.
Function syntax:
string getContainer()
Returns as a string the name of the container in which the current application is loaded. Can be used in expressions like this example:
if getContainer() == "default" ...
This function is not expected to work at the command prompt with sbd --eval.
Function syntax:
string getHostName()
Returns as a string the machine name of the host on which StreamBase Server is running. Can be used in expressions like this example:
if getHostName() == "fasthost" ...
Function syntax:
string getLeadershipStatus()
Returns the string LEADER or NON_LEADER, which represent the current StreamBase Server's leadership status as a member of a high availability cluster. Servers have LEADER status by default on startup, which can be changed with the sbadmin setLeadershipStatus command.
This function is not expected to work at the command prompt with sbd --eval.
Function syntax:
string getNodeName()
Returns as a string the name of the node in which the current application is loaded. Can be used in expressions like this example:
if getNodeName() == "primaryserver" ...
This function is not expected to work at the command prompt with sbd --eval.
Function syntax:
string getParallelRoot()
Returns the qualified path name of the top-level container or Module Reference that is the root of the current parallel region. See Parallel Region Defined.
If this function is called in a simple module without concurrency settings, it returns the path name of the container that holds the operator that calls this function. If this function is called in a module whose Module Reference has a concurrency setting of Run in a parallel region, it returns the qualified path to that Module Reference. If the Module Reference also has a Multiplicity setting of multiple, and the number of instances is set to 2 or more, the returned path is appended with a colon and a zero-based integer representing the instance number of the calling module.
Thus:
-
When called in a simple module in the default container, this function returns
default
. -
When called in a module, B, that has a Module Reference named
B_ref
in the top-level module, A, this function returnsdefault.B_ref
if the Module Reference is marked with the Run in a parallel region concurrency option. -
In addition to the parallel region setting, if
B_ref
is also configured with a Multiplicity setting of 2, this function returns eitherdefault.B_ref:0
ordefault.B_ref:1
.
See StreamBase Path Notation for details on the qualified path names that can be returned.
This function is not expected to work at the command prompt with sbd --eval.
Function syntax:
string getPath(string componentname
)
Returns the qualified path of the specified component of a StreamBase application. Can be used for verifying the container name for the specified component at run time, in expressions like this example:
if getPath("Symbol") == "default.Symbol" ...
See StreamBase Path Notation for details on the qualified path names that can be returned.
This function is not expected to work at the command prompt with sbd --eval.
Function syntax:
string getServerURI()
Returns the StreamBase URI of the currently running StreamBase Server. The value returned is from the point of view of the server, and thus always returns
the hostname localhost
. Use this function to obtain the port number and any authentication parameters appended to the URI.
Use this function instead of systemproperty("streambase.uri")
to obtain the same information.
This function is not expected to work at the command prompt with sbd --eval.
Function syntax:
string getSetCompilerProperty(string sysprop
)
Returns the value of the specified system property if you explicitly set it, or returns UNSET otherwise. The UNSET result means the Server is using the default value for the specified property.
By contrast, getCompilerProperty() returns the actual value of the specified property, and systemproperty() only returns values explicitly set with -Dsome.system.property=value
.
Compare the following:
sbd -e "systemproperty('streambase.querytable.default-memory-tables')
(string) null
sbd -e "getSetCompilerProperty('streambase.querytable.default-memory-tables')
(string) UNSET
sbd -e "getCompilerProperty('streambase.querytable.default-memory-tables')
(string) MEMORY
These results tell you that the streambase.querytable.default-memory-tables
system property was not explicitly set at the command line with -D
, was not explicitly set in a property table and is therefore using the compiled-in default value, and that this default value
is MEMORY.
Function syntax:
int getTableSize(string QTname
)
Returns the number of rows in the specified Query Table. When used in a Query operator that is in the process of adding or deleting table rows, the result is only approximate.
This function is not expected to work at the command prompt with sbd --eval.
This category includes the following functions:
Function syntax:
list(double) correlation_coefficient(string name, int index, list(double) var1, list(double) var2 ...)
Returns a list of correlation coefficients, containing as many members as there are list arguments (there must be at least
two). The output list of doubles is in the same order as the list arguments. Given n
variables, the index argument specifies which of those n
lists holds the primary variable (for which the returned correlation coefficient is 1.0
).
Note
-
The correlation_coefficient of 0 items is null.
-
The correlation_coefficient of 1 item is 1.0.
-
The correlation_coefficient of N identical items is 1.0.
The name argument specifies the type of correlation coefficient to compute, taken from this list:
Click any name above to see the description of the statistic. You can abbreviate and approximate the spellings you supply for the name. You can call any of these functions directly by its full name, as well as using the above syntax. However, when you call these functions directly, only two variables can be correlated per call, and the output is a scalar double.
If any input list contains null values, the function returns null for its correlation coefficient(s).
Function syntax:
double goodman_kruskal_gamma(list(double) var1, list(double) var2)
Returns the Goodman-Kruskal Gamma correlation coefficient between values of variables in two lists as a double value. The Goodman-Kruskal Gamma statistic is a rank correlation of concordant pairs. If any field values are null, the returned value is null.
To perform multivariate Goodman-Kruskal Gamma correlations, use the simple correlation_coefficient
function.
Function syntax:
double kendall_tau(list(double) var1, list(double) var2)
Returns the normalized Kendall Tau distance metric between values of variables in two lists as a double value. The lists must have the same number of elements. The Kendall Tau distance statistic counts the number of pairwise disagreements between the ranks of corresponding items in two fields. Larger values indicate greater disagreement in rank ordering, such that a return value of 0 means complete agreement and 1.0 means complete disagreement.
To perform multivariate Kendall Tau correlations, use the simple correlation_coefficient
function.
Function syntax:
double pearson(list(double) var1, list(double) var2)
Returns the Pearson product-moment correlation coefficient between values of variables in two lists as a double value that
indicates the degree of linear dependence between two variables. The Pearson statistic is defined as the covariance of the
two variables divided by the product of their standard deviations. The term product-moment refers to the mean (the first moment about the origin) of the product of the mean-adjusted random variables. A return value
of -1
indicates a perfect negative linear correlation. A value of 0
indicates no correlation (random variates). A value of 1
indicates perfect positive linear correlation. If any list elements are null, the returned value is null. If the variance
of either input variable is zero, the function returns NaN
.
To perform multivariate Pearson correlations, use the simple correlation_coefficient
function.
Function syntax:
double spearmans_rank(list(double) var1, list(double) var2)
Returns the Spearman's Rank correlation coefficient between values of variables in two lists as a double value. Spearman's
statistic is defined as the Pearson correlation coefficient between the ranked variables. A return value of -1
indicates a perfect negative correlation (reverse ordering). A value of 0
indicates no concordance of ranks. A value of 1
indicates identical rank orderings. If any list values are null, the returned value is null.
To perform multivariate Spearman's Rank correlations, use the simple correlation_coefficient
function.
This category includes the following functions:
contains() | endswith() | format() | indexof() (for strings) |
isempty() (for strings) | lastindexof() (for strings) | length() (for strings) | lower() |
ltrim() | randomstring() | regexmatch() | regexmatch_ignorecase() |
regexreplace() | regexsplit() | regextuple() | rtrim() |
split() | startswith() | string() | strlen() |
strresize() | strresizetrunc() | substr() | trim() |
upper() |
The behavior of functions that deal with strings changes when Unicode support is enabled for StreamBase Server as described in Unicode Support. These cases are noted for each function in this section.
Function syntax:
bool contains(string haystack, string needle)
Takes two arguments: a string and a search string. Returns true
if needle is a substring of haystack, or false
if not.
If either argument is null, returns null
.
The IN infix operator is an alternate syntax for contains().
Also see the list version of contains().
Function syntax:
bool endswith(String biggerString, String smallerString)
Takes two arguments: a string to search within and a (sub)string to search for. Returns true
if biggerString ends with smallerString, or false
if not.
If either argument is null, returns null
.
Function syntax:
string format(stringformat-string
,arg0
,arg1
, ...,argn
)
Returns a formatted string for arguments arg0
through argn
, each formatted according to a specification in format-string
.
Format specifications are patterns specified in the manner of class java.text.MessageFormat
.
For example, an incoming stream containing NYSE stock ticks might have a field named symbol
containing four-character stock symbols. Let's say a downstream operator expects symbols to be prefixed with a string identifying
their source stock exchange. In this case, use an expression like the following in a Map operator:
format("NYSE:{0}", symbol)
An incoming stream might have a price
field specified to four decimal places. When the flow of tuples reaches an output stream that must be formatted for human
readability, you can use an expression like the following to round the price
field to two decimal places. This example assumes that the price
field contains values of type double, which matches the format specifier ###.##.
format("{0,number,###.##}", price)
The format for argument arg0
is specified with the portion of the format-string
that defines {0}
. The format for arg1
is in the format-string
portion that defines {1}
, and so on. The data type of arg0
must match the type expected by the format-string
specifier for {0}
, and so on for each argument.
If you use format()
to modify a tuple, use {0}, {1}, {2} and so on to specify the format for each field.
See the documentation for java.text.MessageFormat
for further details and more examples.
Function syntax:random
int indexof(stringhaystack
, stringneedle
[, intstart
])
Returns the index of the first instance of the string needle
within the string haystack
, for which the index is greater than or equal to start
. For all supported Unicode character sets, this function returns the index in number of graphemes, not bytes. For string arguments, if needle
is not found, the function returns a value of -1.
If the search is successful, the value returned is always greater than or equal to start
. If start
is unspecified, it is taken to be 0, which designates the first character of the string or the first element of the list.
See also the indexof() function that operates on lists.
Function syntax:
bool isempty(String str)
Takes one string argument, str. Returns true
if str is null or empty; false
if not.
Function syntax:
int lastindexof(stringhaystack
, stringneedle
[, intlastStart
])
Returns the index within string haystack
of the rightmost instance of string needle
, for which the index is less than or equal to lastStart
. For all supported Unicode character sets, this function returns the index in number of graphemes, not bytes. For string arguments, if needle
is not found, the function returns a value of -1.
If the search is successful, the value returned is always less than or equal to lastStart
. If lastStart
is unspecified, it is taken to be the index of the last grapheme of the string or the last element of the list.
See also the lastindexof() function that operates on lists.
Function syntax:
int length(string x
)
Returns the number of graphemes in the string, for ASCII and all supported Unicode character sets. If the argument is a null string, the returned value is null
.
See also the length() functions that operate on lists and blobs.
Function syntax:
string ltrim(stringstr
) string ltrim(strings1
, strings2
)
The same as trim(), except that the first argument is trimmed on the left side only.
Function syntax:
string randomstring(int len
)
Returns a string of size len containing random ASCII characters A-Z.
Function syntax:
bool regexmatch(stringregex
, stringstr
)
Attempts to match the specified regular expression in regex
against the entire specified string in str
. To match any substring of the specified string, use wildcards before and after your regular expression. For example, the
regex IBM
matches only when the entire string is "IBM", while the regex .*IBM.*
matches if "IBM" is anywhere in the supplied string str
.
Function syntax:
bool regexmatch_ignorecase(stringregex
, stringstr
)
Same as regexmatch(), but the supplied regex
matches str
without regard to case.
Function syntax:
string regexreplace(stringinput_str
, stringregex_str
, stringreplace_str
[, intlimit
])
Accepts an input string, a regular expression pattern, a replacement string, and an optional integer, limit
. Returns input_str
modified by replacing occurrences of regex_str
in input_str
with replace_str
up to limit
times. If limit
is negative or omitted, all occurrences in input_str
are replaced.
See Oracle's Pattern class documentation for the supported regular expression syntax. The replace_str
argument can contain references to fields in a regular expression capture group, in the same way as java.util.regex.Matcher.appendReplacement().
Usage note: the similar replace() command operates on string literals, and substitutes a string literal for a literal search string. The regexreplace() command
operates on string literals, but substitutes its string literal for a regular expression search, and has support for capture
groups. It would appear at first glance that regexreplace() without the optional limit
argument is the same as replace(), but the difference in the second argument means the two commands have different semantics.
For example, consider the following commands and their results:
replace("this.string", ".", ",")
returns this,string
regexreplace("this.string", ".", ",")
returns ,,,,,,,,,,,
regexreplace("this.string", "\\.", ",")
returns this,string
To specify a capture group, surround an expression with parentheses ()
. This saves the section of the input string matching the capturing group for later recall via a backreference that you specify
in
.
replace_str
A backreference is specified in the regular expression as a dollar sign ($
) followed by a digit indicating the number of the group to be recalled, with a replacement string. For example:
regexreplace('a-b','([a-zA-Z])-([a-zA-Z])','$1_$2')
returns a_b
Function syntax:
list(string) regexsplit(stringinput_str
, stringregex_delimiter
[, intlimit
])
Use the regexsplit() function in the same ways as the split() function, except that you can specify a regular expression as the field delimiter. See Oracle's Pattern class documentation for the supported regular expression syntax.
The following example parses input_str
into list elements using either colon, semicolon, or a space as the field delimiter:
regexsplit("aaa:bbb;ccc:ddd;eee fff", "[:; ]")
returns
list(string) [aaa, bbb, ccc, ddd, eee, fff]
See split() for further details and for instructions on using the optional limit
argument.
Function syntax:
tuple regextuple(tupletuplespec
, stringregex
, sringinput
)
Use the regextuple() function to parse a string into a tuple.
For the first argument, provide a tuple specification in the form of an empty tuple. To do so, use type conversion functions in a tuple() function to define the data type of each field, and at the same time to create a null field of each type. The
first argument of the example below creates a tuple with the required schema, and with each field filled with null: ((symbol string, price double)) null,null
.
The second argument is a Java regular expression that uses named capturing groups, where the capture group names are the same as, and in the same order as, the tuple fields specified in the first argument.
The third argument is the string you want to parse into tuple format, possibly extracted from CSV or other plain text input upstream.
See Oracle's Pattern class documentation for the supported regular expression syntax.
In the following example, notice that you must escape the predefned character class strings with double backslashes:
regextuple(tuple(string() as symbol, double() as price), "(?<symbol>[A-Z]+)\\s+(?<price>\\d+(\\.\\d+)?)", "STP 42.00")
This returns:
((symbol string, price double)) STP,42.0
Tip
To run the above example at the command prompt with sbd --eval, surround the entire regextuple() function with single quotes. (The following single-line command is broken onto two lines for publication clarity.)
sbd --eval 'regextuple(tuple(string() as symbol, double() as price),
"(?<symbol>[A-Z]+)\\s+(?<price>\\d+(\\.\\d+)?)", "STP 42.00")'
Function syntax:
string replace(stringmain_str
, stringsearch_str
, stringreplace_str
)
Returns main_str
after locating all instances of search_str
and replacing them with replace_str
. The search_str
and replace_str
arguments are read as case-sensitive, literal strings. Any wildcard character is read as the literal character, not a wildcard.
Examples:
For a string field named UStext
in the incoming stream: replace(UStext, "honor", "honour")
replace('Open the pod bay doors, IBM.', 'IBM' , 'HAL')
replace('1,2,3,4,5', ',' , '-')
See also the regexreplace() command and the usage note there comparing with replace() syntax.
Function syntax:
string rtrim(stringstr
) string rtrim(strings1
, strings2
)
The same as trim(), except that the first argument is trimmed on the right side only.
Function syntax:
list(string) split(stringinput_str
, stringdelimiter
[, intlimit
])
Returns a list of strings parsed from input_str
, splitting the input string into pieces delimited by the character or phrase specified as delimiter
. The delimiter
character or phrase itself is not preserved in the resulting list when a match is found. For example:
split("aaa:bbb:ccc:ddd:eee", ":")
returns
list(string) [aaa, bbb, ccc, ddd, eee]
If the delimiter character or phrase is not found in input_str
, then input_str
is returned verbatim as the only element of the resulting list. If the delimiter character or phrase occurs adjacent to another
instance of the delimiter, the result is an empty list element. For example, split("aaa::bbb", ":")
results in the following three-element list: [aaa, ,bbb].
You can specify an optional third argument, limit
, an integer specifying the number of list elements in the result list. Use the limit argument in the following ways:
-
If
limit
is any negative number,input_str
is split into as many list elements as necessary, with no limit on the length of the list. All empty list elements are preserved, including any trailing empty elements. For example,split("aaa::bbb::", ":", -1)
results in a five-element list: [aaa, ,bbb, , ]. -
If
limit
is zero or unspecified,input_str
is split into as many list elements as necessary, but any trailing empty elements are discarded. For example,split("aaa::bbb::", ":", 0)
andsplit("aaa:bbb::", ":")
both result in a three-element list: [aaa, ,bbb]. -
If
limit
is positive,input_str
is split into at most limit-1 elements, plus one more element containing the remainder ofinput_str
. For example,split("aaa:bbb:ccc:ddd:eee", ":", 3)
results in a three-element list: [aaa, bbb, ccc:ddd:eee].Use this feature to split a string into the components you want to process, followed by a single catch-all element containing the remainder.
See also the regexsplit() function, a variation of split() that allows you to use a regular expression to specify the delimiter.
Function syntax:
bool startswith(String biggerString, String smallerString)
Takes two arguments: a string to search within and a (sub)string to search for. Returns true
if biggerString begins with smallerString, or false
if not.
If either argument is null, returns null
.
Converts its argument into a string. See string() in the Simple Functions: Type Conversions section.
Function syntax:
int strlen(string str
)
Returns the length of the string str
. This function returns the number of graphemes, not bytes, for all supported Unicode character sets.
Function syntax:
string strresize(stringstr
, intlength
)
The strresize()
function performs no action: it simply returns its str
argument and ignores the length
argument. The function is preserved for compatibility with previous StreamBase releases, in which the StreamBase string data
type had a fixed maximum size. To perform string truncation, use substr() or strresizetrunc().
Function syntax:
string strresizetrunc(stringstr
, intlength
)
Returns its argument str
, truncated to the number of bytes specified in length
.
Note
When using strresizetrunc()
with Unicode multibyte character sets, count the length
argument in bytes, not graphemes.
Function syntax:
string substr(stringstr
, intbegin
, intlength
)
Returns a substring of the string str
, starting at the zero-indexed character begin
with a size of length
characters.
For all supported Unicode character sets, specify length
in number of graphemes, not bytes.
Function syntax:
string trim(stringstr
) string trim(strings1
, strings2
)
With one argument, returns the specified string with leading and trailing spaces removed.
With two arguments, the second argument contains the characters to be removed from the string specified in the first argument.
Function syntax:
string upper(string str
)
Returns its argument in uppercase letters.
This category includes the following functions:
advanceTimeBy() |
advanceTimeTo() |
getTargetTime() |
nanotime() |
sleep() |
systemenv() |
systemproperty() |
version() |
Function syntax:
bool advanceTimeBy(long millis
)
Use this function to increment the system time of an instance of sbd running in a controlled time environment by the specified number of milliseconds.
This function only works if you have configured the current project's sbd.sbconf
file for testing purposes with elements like the following. See Using the StreamBase JUnit TimeService for further information.
<time-service-configurations>
<type>CONTROLLABLE</type>
</time-service-configurations>
Function syntax:
bool advanceTimeTo(timestamp timevalue
)
Use this function to increment the system time of an instance of sbd running in a controlled time environment to the specified timestamp's date and time.
This function only works if you have configured the current project's sbd.sbconf
file for testing purposes with elements like the following. See Using the StreamBase JUnit TimeService for further information.
<time-service-configurations>
<type>CONTROLLABLE</type>
</time-service-configurations>
Function syntax:
timestamp getTargetTime()
Use this function to retrieve a timestamp representing the time StreamBase Server thinks it is right now. This function is
useful in a controlled time environment to retrieve a custom starting point time you have set with elements like the following
in the project's sbd.sbconf
file. Notice that the getTargetTime() timestamp is the same as that returned by now(), because in a controlled time environment,
the server's sense of what current time means has been deliberately changed, and is temporarily not the same as wall clock
time.
<time-service-configurations>
<type>CONTROLLABLE</type>
<target-time>2015-10-06 12:00</target-time>
</time-service-configurations>
This function is not expected to work at the command prompt with sbd --eval.
Function syntax:
long nanotime()
Returns the current value of the system timer as a long value in nanoseconds. This function does not guarantee to return an accurate measurement of the system clock to the nanosecond level of precision, but does provide an accurate counter at nanosecond precision. Therefore, use this function only to measure elapsed time between two invocations, like the following example:
long startcount = nanotime()
// process tuple data
long elapsedtime = nanotime() - startcount
Differences in two calls to nanotime() are not accurate for a span greater than 263 (about 292 years). Do not compare nanotime() values between different machines or even across machine restart boundaries on the same machine, since the hardware time counters are reset on reboot.
Function syntax:
timestamp sleep(timestamp t
)
Pauses execution of the current thread for the specified time period. The argument must be an interval timestamp, and the
return value is the argument. This function can be useful in conjunction with operators that run in a separate thread, and
is provided for debugging or for rare uses. The sleep()
function blocks the concurrent portion of an application, if concurrency is enabled, or blocks the container holding the
application, if concurrency is not enabled. Therefore, use sleep()
with care in production applications. To learn about threading in StreamBase applications, refer to Execution Order and Concurrency.
Function syntax:
string systemenv(string key
)
Calls the standard Java method java.lang.System.getenv()
. Returns the string value of the environment variable specified by the key
argument.
Function syntax:
string systemproperty(stringkey
[, stringdefaultstring
])
Calls the standard java.lang.System.getProperty()
method. Returns the string value of the system property specified by the key
argument. Use the standard set of system property keys described in Java documentation for java.lang.System.getProperties()
(for example, os.name
, java.home
, and user.name
), or specify a StreamBase-specific Java property.
If the optional defaultstring
argument is used, and there is no system property for the specified key, the function returns the string specified in the
defaultstring
argument. If no default string is provided, null is returned.
Function syntax:
string version()
Returns as a string the three-position release number from the currently running release of TIBCO StreamBase, such as 7.5.4.
The timestamp data type can hold either an absolute timestamp or an interval timestamp.
An absolute timestamp represents a date and time. Its value is the number of seconds between the epoch and that date and time, with a maximum precision of milliseconds. The epoch is defined as midnight of January 1, 1970 UTC.
An interval timestamp represents a duration. Its value is the number of seconds in the interval, with a maximum precision of milliseconds.
Absolute timestamps are expressed in the time format patterns of the java.text.SimpleDateFormat
class and those shown in the description of the format_time() function below. Examples on this page are taken from the results of running sbd --eval commands.
See timestamp Data Type for information on the range of timestamp values, on adding and subtracting timestamp values, and on comparing timestamps with relational operators.
Timestamp functions are organized into the following categories, described in the next three sections:
This category includes the following functions, which return an absolute timestamp or a string converted from an absolute timestamp:
date() | epoch() | format_time() |
from_gmtime() | from_localtime() | from_unixtime |
now() | parse_time() | strftime() |
strptime() | today() | today_utc() |
See also the nanotime() function in the System category.
Usage Note
StreamBase offers two similar functions to convert a StreamBase timestamp to a string: format_time() and strftime(). These functions use different libraries, Java and C++ respectively, to provide support for their time format strings.
Be aware that these libraries make opposite use of the z
and Z
format strings, as shown in this table:
function | Meaning of lowercase z or %z | Meaning of uppercase Z or %Z |
---|---|---|
format_time() | Time zone abbreviation (BST, EST, PDT) | Time zone signed number (+0100, -0500, -0800) |
strftime() | Time zone signed number (+0100, -0500, -0800) | Time zone abbreviation (BST, EST, PDT) |
StreamBase also provides two similar functions for parsing a time format string in any format and converting it to a StreamBase absolute timestamp: parse_time() and strptime(). These are based on the same Java and C++ libraries, respectively.
The parse_time() function interprets the z
and Z
format strings equally. The input timestring
can be in either text, abbreviation, or signed number format, and either case of Z interprets them as time zone specifiers.
The strptime() function interprets the %z
and %Z
format strings as separate identifiers. When using strptime(), specify lowercase z
in the formatstring
if the incoming timestring
expresses the time zone as a signed number, such as +0100
, -0500
, or -0800
. Specify uppercase Z
in the formatstring
in the incoming timestring
expresses the time zone as text or a standard abbreviation: BST
, EST
, PDT
, or America/New York
.
Function syntax:
timestamp date(timestamp ts
)
Accepts a timestamp ts
and returns an absolute timestamp for the year, month, and day portion of ts
, with all time values set to zero. This, in effect, extracts and returns the timestamp for midnight local time for the date
in the argument ts
.
For example, the function now(), run on 12 Feb 2009 in the EST time zone, returns a full timestamp like this one:
(timestamp) 2009-02-12 15:48:45.679-0500
But the same function used as an argument for date():
date(now())
returns the following. Notice that all time information was changed to zero to designate midnight:
(timestamp) 2009-02-12 00:00:00.000-0500
Back to Top ^ Back to Timestamp Overview ^
Function syntax:
timestamp epoch()
Returns an absolute timestamp representing the zero point of counted time. For both Windows and UNIX, the epoch is midnight, January 1st, 1970. The value returned is adjusted for the current time zone. To see the absolute epoch, use:
epoch() - timezoneoffset()
Because the epoch occurred during standard time in the northern hemisphere, you may need to add a one-hour offset if you are retrieving the absolute epoch during daylight savings time:
epoch() - timezoneoffset() + hours(1)
Incoming tuples from a market data provider or an exchange might include a timestamp field in the form of a long value (or integer value) representing the number of milliseconds (or number of seconds) since the epoch. Use epoch() to convert such field values to an absolute timestamp.
Many functions, including java.util.Date.getTime()
, return a long value representing the number of milliseconds since the epoch. The timestamp value in market data from many
exchanges use this format. To convert such long values, TSms
, to an absolute timestamp, use an expression like the following:
milliseconds(
TSms
) + epoch()
Functions that return UNIX time, including the UNIX date +%s
command, generally return an integer number of seconds. To convert such integer values, TSsec
, to an absolute timestamp, use an expression like the following:
seconds(
TSsec
) + epoch()
Use seconds() as above, for double values that include a fractional milliseconds component, such as the value returned from the Python
time.time()
function.
Use to_seconds() or to_milliseconds() to take a formatted timestamp string and return the number of seconds or milliseconds since the epoch, respectively.
Back to Top ^ Back to Timestamp Overview ^
Function syntax:
string format_time(timestampts
, stringformatstring
) string format_time(timestampts
, stringformatstring
, stringtz
)
Converts the specified timestamp field, ts
, to a string, using the specified formatstring
pattern. The timestamp field can then be manipulated as a string instead of a timestamp.
Optional string argument
is a time zone ID for specifying a standard name (for example tz
"UTC"
) or location (for example, "Europe/Zurich"
) of a time zone that the formatted time should indicate. When used, time zone IDs must be specified exactly and are case-sensitive.
When given an invalid timezone string, the function sets the time zone to the default, "GMT". For information on specifying
time zone IDs, see Specifying Time Zones.
This function uses the time format patterns from the SimpleDateFormat
class and summarized below.
For example, the function now(), run on 8 September 2014 in the EST time zone, returns a timestamp like this one:
(timestamp) 2014-09-08 14:36:37.235-0400
The same function used as an argument for format_time()
:
format_time(now(), "EEEE, MMM d, yyyy HH:mm zzzz")
returns the following:
(string) Monday, Sep 8, 2014 14:36 Eastern Daylight Time
If you supply a third argument, you can shift the time zone. For example, to obtain Greenwich Mean Time:
format_time(now(), "EEEE, MMM d, yyyy HH:mm zzzz","GMT")
returns:
(string) Monday, Sep 8, 2014 18:36 Greenwich Mean Time
You can use any combination of time format designators from the SimpleDateFormat
class. For example:
format_time(now(), "yyyy-MM-dd G, HH:mm a")
returns:
(string) 2009-02-12 AD, 15:48 PM
This function performs the same task as strftime()
, but uses a different library source for its format pattern strings. The strftime()
function is faster than format_time()
, and should be used when performance is critical in your application. The format pattern strings used by format_time()
are easier to use. See the Usage Note about the different ways this function and strftime() interpret the z and Z formatstring
specifiers.
The following table shows the format strings supported by format_time()
. See the SimpleDateFormat
documentation for further details.
formatstring Component
|
Meaning | Examples |
---|---|---|
a | AM or PM | AM |
d | Day in the month | 11, 31 |
D | Day in the year | 234 |
E | Day of the week. Use EEEE for the full day spelling, EEE or fewer for the abbreviation. | Friday, Fri |
F | Number of the day of the week. | 2 for Tuesday |
G | The era. | AD, BC |
h | The one-based hour on a 12-hour clock, 1 to 12. | 12 |
H | The zero-based hour on a 24-hour clock, 0 to 23. | 23 |
k | The one-based hour on a 24-hour clock, 1 to 24. | 24 |
K | The zero-based hour on a 12-hour clock, 0 to 11. | 11 |
m | The minute in the hour. | 25 |
M | The month of the year. Use MM for a two-digit month, MMM for an abbreviated month name, MMMM for the full month name. | 11, Nov, November |
s | The second in the minute. | 23 |
S | The millisecond fraction of the second | 456 |
w | The number of the week in the year. | 27 |
W | The number of the week in the month. | 3 |
y | The year. Use yy for a two-digit year, interpreted as belonging to the century 80 years prior to or 20 years ahead of the use of this function. Use more than two y's to interpret the year literally. | In 2011, yy = 11, yyyy = 2011. |
z | The time zone shown as text. Use three or fewer z's for the time zone abbreviation, four z's for the full time zone name. | EST, Eastern Standard Time |
Z | The time zone offset from UTC, shown as a positive or negative number as in RFC 822. | -0500 |
Back to Top ^ Back to Timestamp Overview ^
Function syntax:
timestamp from_gmtime(intyear
, intmonth
, intdayOfMonth
, inthour
, intminute
, doubleseconds
)
Creates an absolute timestamp in the UTC (GMT) time zone from the specified integer components of a date and time. All six
arguments are required. You can express the seconds
argument as a double with precision down to milliseconds, or you can use an integer, which is promoted automatically to a
double by the rules of StreamBase Data Type Coercion and Conversion.
You might use this function on a stream that has the date and time broken into integer fields to concatenate and translate those fields into a single timestamp value.
Notice that the returned timestamp shows a moment in the UTC time zone, but that moment is translated and displayed in the
local time zone. For example, compare the same arguments given to from_gmtime()
and from_localtime()
when run in the EST time zone:
from_gmtime(2009, 3, 17, 18, 30, 05)
returns:
(timestamp) 2009-03-17 14:30:05.000-0400
while
from_localtime(2009, 3, 17, 18, 30, 05)
returns:
(timestamp) 2009-03-17 18:30:05.000-0400
Back to Top ^ Back to Timestamp Overview ^
Function syntax:
timestamp from_localtime(intyear
, intmonth
, intdayOfMonth
, inthour
, intminute
, doublesecond
)
Creates an absolute timestamp in the local time zone from the specified integer components of a date and time. All six arguments
are required. You can express the seconds
argument as a double with precision down to milliseconds, or you can use an integer, which is promoted automatically to a
double by the rules of StreamBase Data Type Coercion and Conversion.
You might use this function on a stream that has the date and time broken into integer fields to concatenate and translate those fields into a single timestamp value.
For example:
from_localtime(2009, 3, 17, 18, 30, 05)
returns:
(timestamp) 2009-03-17 18:30:05.000-0400
Back to Top ^ Back to Timestamp Overview ^
Function syntax:
timestamp from_unixtime(double u_time
)
Creates an absolute timestamp in UTC from a Unix timestamp ("POSIX time"). Unix time is the number of seconds that have elapsed in the Unix epoch (since 00:00:00 UTC, Thursday, 1 January 1970), not counting leap seconds. Consequently, the output is encoded only to four decimal digits.
For example:
from_unixtime(0.0)
returns:
(timestamp) 1969-12-31 19:00:00.000-0500
from_unixtime(1.2e9)
returns:
(timestamp) 2008-01-10 16:20:00.000-0500
Back to Top ^ Back to Timestamp Overview ^
Function syntax:
timestamp now()
Returns an absolute timestamp value representing the current time in the local time zone.
You can specify an alternate now()
implementation in the StreamBase Server configuration file. The value is set as a child element of the <runtime>
element:
<param name="now-implementation" value="system" />
Specify one of these values, as appropriate for your StreamBase application:
Value | Meaning |
---|---|
system | Directs the now() function to use Java's System.currentTimeMillis() . This is the default.
|
thread | Directs the now() function to use a background thread that checks the time approximately every millisecond. This option results in decreased
accuracy, but may be more efficient than system if you call now() more frequently than 1000 times per second.
|
Back to Top ^ Back to Timestamp Overview ^
Function syntax:
timestamp parse_time(stringtimestring
, stringformatstring
) timestamp parse_time(stringtimestring
, stringformatString
, stringlanguage
, stringregion
, stringvariant
)
Parses a string of time and date information and returns an absolute timestamp value. The timestring
argument can be any string representation of date and time, as long as its components are individually parsable.
You must specify a formatstring
argument to specify how the timestring
argument is to be interpreted. The format string must specify a time format pattern as defined in the SimpleDateFormat
class. See the format_time() function for a summary of the time format pattern strings. For information on specifying time zone IDs, see Specifying Time Zones.
The arguments added in the second syntax provide information to customize the timestamp for a locale, and may be necessary
if timestrings
include localized abbreviations. The Java class Locale specifies these arguments as:
-
language — lowercase two-letter ISO-639 code
-
country — uppercase two-letter ISO-3166 code
-
variant — vendor and browser specific code, such as WIN for Windows, MAC for Macintosh, and POSIX for POSIX
When using the second syntax, set any unused argument to an empty string.
Use this function to read a time and date string in any format in an incoming stream, and still interpret that string as an
absolute timestamp. For example, an incoming field, TradeTime
, might contain the time and date in the following string format:
TradeTime = "11:17 Fri Feb 13, 09"
In this case, interpret the field TradeTime
as a StreamBase timestamp with an expression like the following:
parse_time(TradeTime, "HH:mm EE MMM dd, yy")
which returns:
(timestamp) 2009-02-13 11:17:00.000-0500
The input timestring
can be in either text, abbreviation, or signed number format. If the timestring
argument includes a time zone specifier, this function interprets the zone and returns the timestamp converted to local time.
The time zone specifier can be in standard text format, such as EST
, EDT
, or PDT
, or signed number format, such as -0500
, -0800
, +0100
. You can specify either z
or Z
to match the time zone specifier. They are interpreted equally.
This function interprets the z
and Z
format strings equally. The input timestring
can be in either text, abbreviation, or signed number format, and both cases of Z interpret them as time zone specifiers.
For example, we can add Central European Time to the TradeTime
field using its signed number designation:
TradeTime = "11:17 Fri Feb 13, 09 +0100"
In this case, add a z
or Z
to the formatstring
argument:
parse_time(TradeTime, "HH:mm EE MMM dd, yy Z")
When run in the EST zone, this function returns the following. You can use the timezoneoffset() function to force the interpretation for a different time zone.
(timestamp) 2009-02-13 05:17:00.000-0500
Notice that it is inconsistent for a time string to contain an hours field based on a 24 hour clock and at the same time to
specify AM or PM in the same string. Thus, it is inconsistent to specify both HH
for the hours field and a
for AM/PM in the same format string. If your time string contains AM or PM, use lowercase hh
to interpret the hours field. StreamBase does not flag an error if you specify HH
and a
in the same format string, but the hours interpretation will be 12 hours offset for afternoon hours.
This function performs the same task as strptime()
, but uses a different library source for its format pattern strings. The strptime()
function is faster than parse_time()
, and should be used when performance is critical in your application. The format pattern strings used by parse_time()
are easier to use. See the Usage Note about the different ways this function and strptime() interpret the z and Z formatstring
specifiers.
See also the timestamp() function.
Back to Top ^ Back to Timestamp Overview ^
Function syntax:
string strftime(stringformatstring
, timestampts
) string strftime(stringformatstring
, timestampts
, stringtz
)
Converts the specified timestamp field, ts
, to a string, formatted according to the specified formatstring
pattern, for the local time zone. The timestamp field can then be displayed and manipulated as a string instead of a timestamp.
Optional string argument
is a time zone ID for specifying a standard name (for example tz
"UTC"
) or location (for example, "Europe/Zurich"
) of a time zone that the formatted time should indicate. When used, time zone IDs must be specified exactly and are case-sensitive.
For information on specifying time zone IDs, see Specifying Time Zones.
This function uses time format patterns described in the table below, which are based on the strftime library, part of the Open Group's Single UNIX Specification.
As an example of using strftime(), consider that the function now(), when run on 05 May 2014 in the EDT time zone, returns a timestamp like this one:
(timestamp) 2014-05-05 10:52:44.273-0400
Here is the same function used as an argument for strftime()
:
strftime("Traded at %H:%M and %S.%f seconds on %b %d, %Y", now())
which returns the following:
(string) Traded at 10:52 and 44.273 seconds on May 05, 2014
If you wanted the return value to be expressed in UTC time, add its time zone ID as a third argument:
strftime("Traded at %H:%M and %S.%f seconds on %b %d, %Y", now(),"UTC")
which changes the result to:
(string) Traded at 14:52 and 44.273 seconds on May 05, 2014
This function performs the same task as format_time()
, but uses a different library source for its format pattern strings. The strftime()
function is faster than format_time()
, and should be used when performance is critical in your application. The format pattern strings used by format_time()
are easier to use, but note that format_time()
does not recognize time zone ID arguments. See the Usage Note about the different ways this function and format_time()
interpret the z and Z formatstring
specifiers.
The following table shows the format strings supported by strftime()
. See the strftime library documentation for further details. An asterisk in the formatstring
column denotes format strings added by StreamBase.
formatstring Component
|
Meaning |
---|---|
%a | The locale's abbreviated weekday name: Mon, Wed. |
%A | The locale's full weekday name: Monday, Wednesday. |
%b | The locale's abbreviated month name: Jan, Nov. |
%B | The locale's full month name: January, November. |
%c | The locale's default time and date string. |
%C | The year divided by 100 and truncated to an integer: 99, 00, 02, 11 |
%d | The two-digit day of the month, from 01 to 31. |
%D | The month, day, and year with slash separator. Equivalent to specifying %m/%d/%y .
|
%e | The one- or two-digit day of the month, from 1 to 31. Like %d, except single digit days are preceded by a space. |
%f * | The milliseconds portion of timestamp. |
%F | The month, day, and year with hyphen separator. Equivalent to specifying %m-%d-%y .
|
%g | The last two digits of the year. |
%G | The four-digit year. |
%h | Same as %b. |
%H | The two-digit hour on a 24-hour clock, 00 to 23. |
%I | The two-digit hour on a 12-hour clock, 01 through 12. |
%j | The three-digit day of the year, 1 through 366. |
%k | The one or two digit hour on a 24-hour clock, with leading space if < 10. |
%l | The one or two digit hour on a 12-hour clock, with leading space if < 10. |
%m | The two-digit number of the month, 01 to 12. |
%M | The two-digit minute, from 00 to 59. |
%n | Replaced with a new line character. |
%p | The locale's equivalent of AM or PM. |
%r | The time in AM or PM notation. For the POSIX locale, the same as %I:%M:%S %p |
%R | (Not supported by the StreamBase strftime() function.)
|
%s * | The time value of timestamp ts in milliseconds divided by 1000.
|
%S | The two-digit second, from 00 to 60. (60 is available in case of leap seconds) |
%t | Replaced with a tab character. |
%T | The time, equivalent to %H:%M:%S |
%u | The numeric day of the week, 1 through 7, with Monday = 1. |
%U | The two-digit number of the week of the year, from 00 through 53. The first January Sunday begins week 1; days in the new year before that Sunday are in week 0. |
%V | The two-digit number of the week of the year, from 01 through 53, with Monday as the first day. If the first week has four or more January days, it is week 1; otherwise it is the last week of the previous year. That is, January 4th and the first Thursday of January are always in week 1. |
%w | The numeric day of the week, 0 through 6, with Sunday = 0. |
%W | The two-digit number of the week of the year, from 00 through 53. The first January Monday begins week 1; days in the new year before that Monday are in week 0. |
%x | The date, using the locale's default date format. |
%X | The time, using the locale's default time format. |
%y | The year in two digits, from 00 through 99. |
%Y | The year in four digits. |
%z | The time zone offset from UTC in ISO 8601:2000 standard format, such as +hhmm or -hhmm , or empty if no time zone can be determined.
|
%Z | The time zone name or abbreviation, or empty if no zone information is in the timestamp being converted. |
%% | Replaced by a literal %. |
%+ * | Returns the current time formatted as if you had specified "%a %b %d %H:%M:%S %Z %Y" . For example, strftime("%+", now()) returns a string like the following:Thu Feb 18 15:42:35 EST 2010 |
* These format patterns are StreamBase extensions that are either not found in the Open Group's strftime() function, or are used differently. |
Some format strings can be modified with E or O between the percent sign and the format character. See the strftime library documentation for further details.
Back to Top ^ Back to Timestamp Overview ^
Function syntax:
timestamp strptime(stringtimestring
, stringformatstring
)
Parses a string of time and date information and returns an absolute timestamp value. The timestring
argument can be any string representation of date and time, as long as its components are individually parsable.
You must also specify a formatstring
argument to specify how the timestring
argument is to be interpreted. The format string must specify a time format pattern using a subset of the patterns defined
for the strptime library (part of the Open Group's Single UNIX Specification), as listed in the table below.
This function interprets the %z
and %Z
format strings as separate identifiers. Specify lowercase z
in the formatstring
if the incoming timestring
expresses the time zone as a signed number, such as +0100
, -0500
, or -0800
. Specify uppercase Z
in the formatstring
in the incoming timestring
expresses the time zone as text or a standard abbreviation: BST
, EST
, PDT
, or America/New York
.
Use this function to read a time and date string in any format in an incoming stream, and still interpret that string as an
absolute timestamp. For example, an incoming field, TradeTime
, might contain the time and date in the following string format:
TradeTime = "15-51-26.984 May 5, 14"
In this case, interpret the field TradeTime
as a StreamBase timestamp with an expression like the following:
strptime(TradeTime, "%H-%M-%S.%f %b %d, %y")
which returns:
(timestamp) 2014-05-05 15:51:26.984-0400
This function performs the same task as parse_time()
, but uses a different library source for its format pattern strings. This strptime()
function is faster than parse_time()
, and should be used when performance is critical in your application. The format pattern strings used by parse_time()
are easier to use. See the Usage Note about the different ways this function and parse_time() interpret the z and Z formatstring
specifiers.
See also the timestamp() function, and strpinterval(), a function similar to strptime()
, but for interval timestamps.
The following table shows the format strings supported by strptime()
. See the strptime library documentation for further details.
formatstring Component
|
Meaning |
---|---|
%b | The month, using the current locale's month names. Abbreviated and full month names are recognized. |
%B | Same as %b. |
%d | The day of the month, from 1 to 31. |
%D | The month, day, and year with slash separator. Equivalent to specifying %m/%d/%y .
|
%f * | The milliseconds portion of the seconds field. |
%H | The hour, using a 12-hour clock, 0 to 11, or a 24-hour clock, 0 to 23. If the incoming string expresses hours on a 12-hour clock, be sure to include the %p format string. If an incoming string's hours value is from 12 to 23 and the format string includes %p, the %p is ignored. |
%m | The number of the month, 1 to 12. |
%M | The minute, from 0 to 59. |
%p | The strings AM or PM. |
%S | The second, from 0 to 59. |
%x | The date, using the locale's default date format. |
%y | The year in two digits. Values in the range 69 to 99, inclusive, are interpreted as 1969 through 1999. Values in the range 00 through 68, inclusive, are interpreted as 2000 through 2068. |
%Y | The year in four digits. |
%z * | A time zone designator in the form +hhmm , -hhmm , +hh:mm , or -hh:mm . Examples: -0500, +1430, -08:00
|
%Z * | A standard time zone string or abbreviation, such as EST , GMT-05:00 , America/Los_Angeles .
|
! | Use an exclamation point in the format string to indicate that the remainder of the input string is optional. |
For %d, %H, %m, %M, and %S, leading zeros are recognized but not required. | |
* These format patterns are StreamBase extensions not found in the Open Group's strptime() function, or are used differently. |
Back to Top ^ Back to Timestamp Overview ^
Function syntax:
timestamp today()
Returns an absolute timestamp value for the beginning of the current day (midnight) in the local time zone.
Back to Top ^ Back to Timestamp Overview ^
Function syntax:
timestamp today_utc()
Returns an absolute timestamp value for the beginning of the current day (midnight) in the UTC (GMT) time zone, translated for the local time zone.
Thus, when today()
and today_utc()
are run in sequence in the US Eastern time zone during standard time, they return:
2009-02-13 00:00:00.000-0500 2009-02-12 19:00:00.000-0500
The expressions today()
and today_utc() - timezoneoffset()
return the same value:
2009-02-13 00:00:00.000-0500 2009-02-13 00:00:00.000-0500
On UNIX, today_utc()
requires the TZ environment variable to be set in order to translate correctly to the local time zone. On Windows, the TZ
environment variable is ignored if set.
This category includes the following functions, which return an interval timestamp:
days() | hours() | interval() |
isinterval() | milliseconds() | minutes() |
seconds() | strpinterval() | time() |
timezoneoffset() | weeks() |
Function syntax:
timestamp days(intx
) timestamp days(doublex
)
Returns an interval timestamp representing an interval of x
days.
Back to Top ^ Back to Timestamp Overview ^
Function syntax:
timestamp hours(intx
) timestamp hours(doublex
)
Returns an interval timestamp representing an interval of x
hours.
Back to Top ^ Back to Timestamp Overview ^
Function syntax:
timestamp interval(string formatstring
)
Parses the argument formatstring
as a time format string and returns an interval timestamp. This function uses the time format patterns from strftime library, described in the Open Group's Single UNIX Specification. See the table of format patterns listed for the strftime() function.
The time format string is in the form %H:%M:%S!.!%f
. The exclamation points show where the remainder of the string is optional. Thus, you can enter strings in any of the following
formats, where hh = hours, mm = minutes, ss = seconds, and ff = a fractional second value in milliseconds:
-
hh:mm:ss
-
hh:mm:ss.
-
hh:mm:ss.ff
For example:
interval("2:00:01")
interval("2:00:01.")
both return (timestamp) 7201.0
, the number of seconds in the specification of two hours plus one second.
interval("2:00:01.333")
returns (timestamp) 7201.333
, the number of seconds in the specification of two hours plus one and a third seconds.
Back to Top ^ Back to Timestamp Overview ^
Function syntax:
bool isinterval(timestamp ts
)
Returns true if the argument ts
is an interval timestamp. See Simple Functions: Timestamp Overview for a discussion of interval versus absolute timestamp types.
Back to Top ^ Back to Timestamp Overview ^
Function syntax:
timestamp milliseconds(intx
) timestamp milliseconds(doublex
)
Returns an interval timestamp representing an interval of x
milliseconds.
Use this function in conjunction with epoch() to convert a tuple field value representing the number of milliseconds since the epoch to an absolute timestamp. See epoch().
Back to Top ^ Back to Timestamp Overview ^
Function syntax:
timestamp minutes(intx
) timestamp minutes(doublex
)
Returns an interval timestamp representing an interval of x
minutes.
Back to Top ^ Back to Timestamp Overview ^
Function syntax:
timestamp seconds(intx
) timestamp seconds(doublex
)
Returns an interval timestamp representing an interval of x
seconds.
Use this function in conjunction with epoch() to convert a tuple field value representing the number of seconds since the epoch to an absolute timestamp. See epoch().
Back to Top ^ Back to Timestamp Overview ^
Function syntax:
timestamp strpinterval(stringtimestring
, stringformatstring
)
Parses a string of time interval information and returns an interval timestamp value. The timestring
argument can be any string representation of a time interval, as long as its components are individually parsable.
You must also specify a formatstring
argument to specify how the timestring
argument is to be interpreted. The format string must specify a time format pattern using the following subset of the patterns
defined for the strptime library (part of the Open Group's Single UNIX Specification):
formatstring Component
|
Meaning |
---|---|
%f | The milliseconds portion of the seconds field. (This is a StreamBase extension not found in the strptime library.) |
%H | The hour, using a 24-hour clock, 0 to 23. |
%M | The minute, from 0 to 59. |
%S | The second, from 0 to 59. |
! | Use an exclamation point in the format string to indicate that the remainder of the input string is optional. |
For %H, %M, and %S, leading zeros are recognized but not required. |
Use this function to read a string representing a time interval in an incoming stream, and still interpret that string as
an interval timestamp. For example, an incoming field, ElapsedTime
, might contain a time interval expressed in following string format:
ElapsedTime = "1, 7, 5.250"
which you know from the data vendor's documentation means one hour and 7 minutes, plus five and a quarter seconds. In this
case, interpret the field ElapsedTime
as a StreamBase interval timestamp with an expression like the following:
strptime(ElapsedTime, "%H, %M, %S.%f")
which returns:
(timestamp) 4025.25
If the ElapsedTime
field in the incoming stream intermittently includes the seconds, use an exclamation point to show that everything after
the minutes pattern is optional:
strptime(ElapsedTime, "%H, %M!, %S.%f")
Then when a new ElapsedTime
value comes through set to "2, 4"
, it is still interpreted and converted to an interval timestamp:
(timestamp) 7440.0
See also the strptime(), which performs for absolute timestamps what this function does for interval timestamps.
Back to Top ^ Back to Timestamp Overview ^
Function syntax:
timestamp time(timestamp ts
)
Returns as an interval timestamp the time of day expressed in the argument, ts
– that is, the number of seconds between the start of the day expressed by ts
and the moment expressed by ts
. Thus, if run at the same instant, the following expressions return the same result: time(now())
and now() - today()
.
Back to Top ^ Back to Timestamp Overview ^
timestamp timezoneoffset()
timestamp timezoneoffset(timestamp ts
)
Without an argument, returns a signed interval timestamp that expresses in seconds the offset of the local time zone from
UTC. Thus, timezoneoffset()
returns as an interval the same value that format_time(now(), "Z")
returns as a string. Because interval timestamps are expressed in seconds, you can divide the results by 3600 to obtain for
display purposes the number of hours in the offset.
The returned time zone offset value is calculated relative to the system time for Windows, and relative to the system time
or to the time zone specified in the TZ
environment variable for UNIX. On UNIX, make sure that the environment from which the sbd process is launched either does not have a TZ
variable or that it is correctly set for the local system's time zone. The TZ
environment variable is ignored on Windows.
You can subtract the return of timezoneoffset()
from any absolute timestamp value to generate a new absolute timestamp that represents the same moment in the UTC time zone.
For example, now() - timezoneoffset()
returns the present moment in the UTC time zone, and returns the same result when run from any time zone.
Because the value returned from timezoneoffset()
is signed, you do not need to consider whether to add or subtract a positive or negative offset value from the timestamp
of interest. Always use a subtraction operation, which insures that negative timezoneoffset()
values (that is, those from time zones west of UTC) are effectively added, while positive values (from time zones east of
UTC) are subtracted.
Once you have a timestamp value representing UTC time, you can convert it to a timestamp for another time zone by adding or subtracting a fixed value representing the offset of the time zone of interest. For example, the following expression returns the current time for Sydney, Australia for April to October, when Sydney is on standard time:
(now() - timezoneoffset()) - hours(10.0)
(For October to April, when Sydney is on daylight savings time, use hours(11.0)
.)
If you provide an absolute timestamp as an argument, timezoneoffset()
returns the UTC offset of the system's current time zone at the date and time of the specified timestamp. The offset is calculated
based only on the date and time information in the specified timestamp, relative to the system's current time zone (or, on
UNIX, to the zone in TZ
). You can use the argument form of timezoneoffset()
to determine the UTC offset for future or historical dates.
For example, the following commands show that on the US East Coast, the UTC offset changed at 2:00 AM on March 8, 2009, when daylight savings time began.
sbd --eval "timezoneoffset(timestamp('2009-03-08 01:59:59'))/3600"
timestamp (-5.0)
sbd --eval "timezoneoffset(timestamp('2009-03-08 02:00:00'))/3600"
timestamp (-4.0)
sbd --eval "timezoneoffset(timestamp('2009-03-08 03:00:00'))/3600"
timestamp (-4.0)
(Actually, at 2:00 AM clocks were turned ahead one hour to 3:00 AM, but timezoneoffset()
correctly calculates the UTC offset for both 2:00 and 3:00 AM.)
You can also extract specific fields of a timestamp in either local time or for a specific time zone using functions such
as get_hour()
, which take a string time zone ID as an optional argument. For more information, see Timestamp Functions: Timestamp Fields.
Back to Top ^ Back to Timestamp Overview ^
Function syntax:
timestamp weeks(intx
) timestamp weeks(doublex
)
Returns an interval timestamp representing an interval of x
weeks.
This category includes the following functions, used to get and set individual fields in absolute timestamps:
Function syntax:
double get_millisecond(timestamp ts
)
Returns the milliseconds portion of the absolute timestamp ts
. The returned value is a double, and includes only the fractional value of the milliseconds of the current second.
Back to Top ^ Back to Timestamp Overview ^
Function syntax:
double get_second(timestampts
) double get_second(timestampts
, stringtz
)
Returns the seconds portion of the absolute timestamp ts
for local time zone. The returned value is a double, and includes a fractional value in milliseconds.
To return a value for a particular time zone, provide the text of its ID as a second argument, tz
, which is case-sensitive. Unrecognized time zone IDs default to UTC. For information on specifying time zone IDs, see Specifying Time Zones.
Back to Top ^ Back to Timestamp Overview ^
Function syntax:
int get_minute(timestampts
) int get_minute(timestampts
, stringtz
)
Returns the minutes portion of the absolute timestamp ts
for local time zone.
To return a value for a particular time zone, provide the text of its ID as a second argument, tz
, which is case-sensitive. Unrecognized time zone IDs default to UTC. For information on specifying time zone IDs, see Specifying Time Zones.
Back to Top ^ Back to Timestamp Overview ^
Function syntax:
int get_hour(timestampts
) int get_hour(timestampts
, stringtz
)
Returns the hours portion of the absolute timestamp ts
for local time zone.
To return a value for a particular time zone, provide the text of its ID as a second argument, tz
, which is case-sensitive. Unrecognized time zone IDs default to UTC. For information on specifying time zone IDs, see Specifying Time Zones.
Back to Top ^ Back to Timestamp Overview ^
Function syntax:
int get_day_of_week(timestamp ts
)
int get_day_of_week(int year, int month, int day)
Returns an integer representing the day of the week given an absolute timestamp ts
for local time zone or three integers representing a year, month and day. The returned value is between 0
(Sunday) and 6
(Saturday).
Back to Top ^ Back to Timestamp Overview ^
Function syntax:
int get_day_of_month(timestamp ts
)
Returns an integer representing the day of the month in the absolute timestamp ts
for local time zone. The returned value is between 1
and 31
(is one-based).
Back to Top ^ Back to Timestamp Overview ^
Function syntax:
int get_day_of_year()
int get_day_of_year(int year, int month, int day)
int get_day_of_year(timestamp ts
)
Returns an integer representing the Julian day of the year for local time zone. The returned value is between 1
and 365
(or 366
for leap years).
get_day_of_year
with three integer arguments or one timestamp argument returns the Julian date for the specified year, month and day.
get_day_of_year
with no arguments is equivalent to get_day_of_year(today())
.
Back to Top ^ Back to Timestamp Overview ^
Function syntax:
int get_month(timestampts
) int get_month(timestampts
, stringtz
)
Returns an integer representing the month in the absolute timestamp ts
for local time zone. The returned value is between 1
(January) and 12
(December).
To return a value for a particular time zone, provide the text of its ID as a second argument, tz
, which is case-sensitive. Unrecognized time zone IDs default to UTC. For information on specifying time zone IDs, see Specifying Time Zones.
Back to Top ^ Back to Timestamp Overview ^
Function syntax:
int get_year(timestampts
) int get_year(timestampts
, stringtz
)
Returns the year portion of the absolute timestamp ts
for local time zone.
To return a value for a particular time zone, provide the text of its ID as a second argument, tz
, which is case-sensitive. Unrecognized time zone IDs default to UTC. For information on specifying time zone IDs, see Specifying Time Zones.
Back to Top ^ Back to Timestamp Overview ^
Function syntax:
timestamp set_second(timestampts
, intx
) timestamp set_second(timestampts
, longx
) timestamp set_second(timestampts
, doublex
)
Returns a modified version of the absolute timestamp t
s, with the seconds component replaced with x
. If you supply an int or long value for x
, the milliseconds portion of x
is set to 000. If you supply a double value, you can replace the seconds component down to the millisecond.
Back to Top ^ Back to Timestamp Overview ^
Function syntax:
timestamp set_minute(timestampts
, intx
)
Returns a modified version of the absolute timestamp t
s, with the minutes component replaced with x
.
Back to Top ^ Back to Timestamp Overview ^
Function syntax:
timestamp set_hour(timestampts
, intx
)
Returns a modified version of the absolute timestamp t
s, with the hours component replaced with x
.
Back to Top ^ Back to Timestamp Overview ^
Function syntax:
timestamp set_day_of_week(timestampts
, intx
)
Returns a new timestamp that has the same value as absolute timestamp t
s, but with the day of the week component replaced with the value of x
. Days of the week are numbered from 0
to 6
, Sunday through Saturday. Day 7 is also considered to be Sunday.
Back to Top ^ Back to Timestamp Overview ^
Function syntax:
timestamp set_day_of_month(timestampts
, intx
)
Returns a modified version of the absolute timestamp t
s, with the day of month component replaced with x
.
Back to Top ^ Back to Timestamp Overview ^
Function syntax:
timestamp set_month(timestampts
, intx
)
Returns a modified version of the absolute timestamp t
s, with the month component replaced with x
.
Back to Top ^ Back to Timestamp Overview ^
Function syntax:
timestamp set_year(timestampts
, intx
)
Returns a modified version of the absolute timestamp t
s, with the year component replaced with x
.
Back to Top ^ Back to Timestamp Overview ^
Function syntax:
long to_milliseconds(timestamp ts
)
Converts an interval or absolute timestamp to a long value representing milliseconds. This function is similar to to_seconds()
, but is more precise and faster. If the argument ts
is an interval, returns the number of milliseconds in the interval. For example, to_milliseconds(now() - today())
returns the number of milliseconds since midnight.
If the argument ts
is an absolute time, returns the number of milliseconds since the epoch, Jan 1, 1970. For example, to_milliseconds(timestamp("2007-11-29 17:32"))
returns 1196375520000
.
Back to Top ^ Back to Timestamp Overview ^
Function syntax:
double to_seconds(timestamp ts
)
Converts an interval or absolute timestamp to a double in seconds. If the argument ts
is an interval, returns the number of seconds in the interval. For example, to_seconds(now() - today())
returns the number of seconds since midnight.
If the argument ts
is an absolute time, returns the number of seconds since the epoch, Jan 1, 1970. For example, to_seconds(timestamp("2007-11-29 17:32"))
returns 1.19637552E9
.
This category includes the following functions:
blob() | bool() | double() |
int() | list() | long() |
named schema constructor function | string() | timestamp() |
tuple() |
Function syntax:
blob blob(string arg) blob blob()
Converts arg
to a value of type blob where arg
is a string. Returns a blob representation of the string. With no argument, returns a null blob as if you had used blob(null)
.
For additional conversion functions involving blobs, see Simple Functions: BSON.
Function syntax:
bool bool(bool arg) bool bool(double arg) bool bool(int arg) bool bool(long arg) bool bool(string arg) bool bool(blob arg) bool bool()
Converts arg
to a value of type bool where arg
is a bool, double, int, long, or string.
-
An int argument of 0 returns the Boolean
false
. All other int values, positive or negative, returntrue
. -
A double argument of 0 returns the Boolean
false
. All other double values, positive or negative, returntrue
. -
A string argument
false
returns a Booleanfalse
and the stringtrue
returns Booleantrue
. The string is compared in a case-insensitive manner. For example,bool("FALSE")
andbool("false")
are equivalent. Any string other thanfalse
ortrue
is an error. -
A blob argument returns the content of a BSON blob containing a bool
-
A bool value returns itself.
-
With no argument, returns a null bool as if you had used
bool(null)
.
Function syntax:
double double(bool arg) double double(double arg) double double(int arg) double double(long arg) double double(string arg) double double(blob arg) double double()
Converts arg
to a value of type double where arg
is a bool, double, int, long, or string.
-
A bool argument returns
1.0
if true, or0.0
if false. -
A double argument returns itself.
-
An int argument returns the same value converted to type double. For example,
double(3)
returns3.0
. -
A long argument returns the same value converted to type double. For example, double(15L) returns 15.0.
-
A string argument is parsed as a decimal number. For example,
double("123.456")
returns123.456
, butdouble("7abc")
returns an error. Scientific notation is supported, sodouble("1.2E4")
returns12000.0
. -
A blob argument returns the content of a BSON blob containing a double
-
With no argument, returns a null double as if you had used
double(null)
.
Also See: The to_seconds() function, used to convert timestamps to double.
Function syntax:
int int(bool arg) int int(double arg) int int(int arg) int int(long arg) int int(string arg) int int(blob arg) int int()
Converts arg
to a value of type int where arg
is a bool, double, int, long, or string.
-
A bool argument returns
1
if true, or0
if false. -
A double argument has its fractional part truncated. For example,
int(3.4)
returns3
. -
An int argument returns itself.
-
A long argument returns the int same as a long value. For example, int(150L) returns 150.
-
A string argument is parsed as a decimal number. For example,
int("123")
returns123
, butint("7abc")
is an error. -
A blob argument returns the content of a BSON blob containing an int
-
With no argument, returns a null int as if you had used
int(null)
.
Function syntax:
list list(e1
,e2
,e3
, ...)
Returns a list containing each of the specified elements e1
, e2
, and so on, where each specified element is of the same data type. The specified elements can be literals or expressions,
as long as each resolves to (or can be coerced to) the same data type. You can create a list of any StreamBase data type.
Use emptylist() to return an empty list, and use nulllist() to return a null list. (See Null Lists for a discussion of null lists compared to empty lists.)
Function syntax:
long long(boolarg
) long long(doublearg
) long long(intarg
) long long(longarg
) long long(blobarg
) long long()
Converts expression arg
to a value of type long where arg
evaluates to a bool, double, int, or long.
-
An int argument returns the same value, but with type long.
-
A double argument has its fractional part truncated, and the remainder returned with type long.
-
A bool argument returns
1
if true, or0
if false. -
A blob argument returns the content of a BSON blob containing a long
-
A long value returns itself.
-
With no argument, returns a null long as if you had used
long(null)
.
Function syntax:
tupleschema_name
(fieldspec
[,fieldspec2
...]) [as tuplename]
For every named schema you define or import into a module, StreamBase automatically generates a new function in the StreamBase expression language that allows you to construct tuples with that schema. The generated function's name is the name of the named schema, and it takes one or more comma-separated expressions as value arguments. The result is a single tuple with the same schema as the named schema.
For example, for the named schema point
, whose schema is (x double, y double)
, you can use a function named point()
anywhere in the module that defines the named schema:
point(32.0, 44.5)
point(32.0 as x, 44.5 as y)
In StreamSQL, a final AS keyword is required to name the resulting tuple:
point(32.0, 44.5) AS p2
point(32.0 as x, 44.5 as y) AS p2
You can specify the list of fieldspec
field specifications in two ways:
-
By position
-
By name
The following examples presume a named schema with the name Quotes
, having the following schema:
(Symbol string, Price double, NumShares int)
When specifying by position, you do not need to specify AS keywords and field names, but you must specify all fields in schema order. For example:
Quotes("IBM", 132.42, 5000)
When specifying by name, you must use the AS keyword to specify field names, which must exactly match the spelling of the named schema's fields. Any fields you do not specify are automatically filled with nulls:
Quotes("IBM" as Symbol, 132.42 as Price)
When specifying by name, you can specify fields in any order:
Quotes(132.42 as Price, "IBM" as Symbol)
You cannot mix field specifications with and without the AS keyword.
Specifying fields by name provides flexibility when you use the same named schema in several places in a large application. If you later edit the named schema itself to add a new field, you do not need to add the new field everywhere the schema is used. If the fields are specified by name in all locations, the new field does not cause an error condition, and is automatically filled with null.
Use the name of a named schema with no arguments to construct a null tuple (which is not the same as a tuple whose fields
are all null). For example: point(), Quotes()
Avoid naming a schema the same as an existing expression language function. If you do, the generated constructor function would mask and prevent the use of the same-named built-in function. See Named Schema Name Collisions.
Use the * AS *
syntax for tuples defined with a named schema to copy the entire tuple into a single field of type tuple.
For example, let's say the tuple arriving at the input port of a Map operator was defined upstream with the NYSE_FeedSchema
named schema. To preserve the input tuple unmodified for separate processing, the Map operator could add a field of type
tuple using settings like the following in the Additional Expressions grid. When using the * AS *
syntax in the Expression column, the name of the tuple field in the Field Name column has an implied asterisk for all of its fields.
Action | Field Name | Expression |
---|---|---|
Add | OriginalOrder | NYSE_FeedSchema(input1.* as *) |
Because the Map operator has only one input port, the port does not need to be named:
Action | Field Name | Expression |
---|---|---|
Add | OriginalOrder | NYSE_FeedSchema(* as *) |
Function syntax:
string string(blobarg
) string string(boolarg
) string string(doublearg
) string string(intarg
) string string(longarg
) string string(stringarg
) string string(timestamparg
) string string()
Converts arg
to a value of type string where arg
is a blob, bool, double, int, long, string, or timestamp.
-
An int, double or long argument returns the string representation of
arg
. -
A string argument returns itself.
-
A bool argument returns the string
true
if true, orfalse
if false. -
A timestamp argument, if represented as a date and time, returns a string that includes the explicit time zone. A timestamp represented as a numeric value returns the string representation of that value.
-
A blob argument returns the string representation of the blob.
-
With no argument, returns a null string as if you had used
string(null)
.
Function syntax:
timestamp timestamp(string timestring
)
timestamp timestamp()
Returns an absolute timestamp parsed from the argument timestring
, a string representation of a date with optional time. With no argument, returns a null timestamp as if you had used timestamp(null)
.
The timestring
argument must be in the form of a time format pattern as defined in the java.text.SimpleDateFormat
class. See the format_time() function for a summary of the time format pattern strings.
You can specify only the date portion:
timestamp("2009-02-12")
If you specify any part of the time value, you must specify at least the hours and minutes:
timestamp("2009-02-12 14:28")
or timestamp("2009-02-12 14:28:05.250")
You can also specify the entire SimpleDateFormat
string, including time zone specification:
timestamp("2009-02-12 14:28:05.250-0800")
See also the strptime() and parse_time() functions, used to convert strings to timestamps.
Function syntax:
tuple tuple(value
asfieldname
[,value2
asfieldname2
[,...]]) [as tuplename] tuple tuple()
Takes one or more comma-separated value expressions as arguments, each field named with the AS keyword. The result is a single tuple with the specified contents. For example:
tuple(126.50 as price, "IBM" as symbol)
In StreamSQL, you must specify a name for the resulting tuple with a final AS keyword:
tuple(126.50 as price, "IBM" as symbol) as tick1;
When using tuple() to create individual tuples, to be compliant with a particular schema, you must specify all fields in the schema's field order. The tuple() function has no knowledge of the schema you are trying to match. For these reasons, we recommend using named schemas and the named schema constructor function.
You can duplicate any tuple field into another field of type tuple without using wildcards. For example, a Map operator might
have an entry like the following in its Additional Expressions grid, where both IncomingTuple
and CopyOfIncomingTuple
are the names of tuple fields:
Action | Field Name | Expression |
---|---|---|
Add | CopyOfIncomingTuple | IncomingTuple |
Use the .*
syntax to flatten a tuple field into the top level of a stream.
For example, a Map operator might define an entry like the following in its Additional Expressions grid. When using this syntax, you must have an asterisk in both Field Name and Expression columns.
Action | Field Name | Expression |
---|---|---|
Add | * | IncomingTuple.* |
This category includes the following functions:
Function syntax:
T coalesce(Targ1
, ..., Targn
)
Returns the first non-null argument, or a null value if all arguments are null. Accepts all data types, but all arguments must have the same type. The returned value has the same data type as the arguments. For tuple arguments, returns the first tuple that is non-null. This function stops evaluating arguments after the first non-null argument is detected.
Since literal values are never null, you can specify a literal value as the last argument to provide an effective default
value for the list. In this way, coalesce()
can be used to emulate the NVL()
function provided by Oracle PL/SQL and the two-argument ISNULL() function provided by Microsoft T/SQL. For example, the following
expression returns the value of fieldA
if it is non-null, or 0
if it is null:
coalesce(fieldA, 0)
The following example returns the first non-null field among fieldA
, fieldB
, and fieldC
in that order, or returns -99999
if all three fields are null:
coalesce(fieldA, fieldB, fieldC, -99999)
Function syntax:
T coalesce_tuples(Targ1
, ..., Targn
)
For non-hierarchical data types, coalesce_tuples() acts the same as the coalesce() function. For hierarchical types, this function coalesces each sub-field, recursing into further nested fields as necessary.
For example, the following function:
coalesce_tuples( tuple(double() AS x, 1.2 AS y, double() AS z), tuple(1.0 AS x, 1.3 AS y, double() AS z), tuple(double() AS x, double() AS y, 1.4 AS z) )
returns the following tuple:
((x double, y double, z double)) 1.0, 1.2, 1.4
Remember that double() evaluates to double(null). Thus, the coalesce_tuples
function in this example selected:
-
The first non-null value for x, 1.0, which was found in the second input tuple.
-
The first non-null value for y, 1.2, which was found in the first input tuple.
-
The first non-null value for z, 1.4, which was found in the third input tuple.
Function syntax:
function compile(stringexpr
, tuplevarfields
, Ttypespec
) function compile(stringexpr
, tuplevarfields
, Ttypespec
, stringname
) function compile(stringexpr
, tuplevarfields
, Ttypespec
, stringname
, tupleenvironment
)
Converts a string defining an expression and a tuple defining its arguments into an executable StreamBase function at runtime. Compiles the function body provided as the first argument, providing it with arguments specified in the second argument. The third argument defines the data type the function is to return when executed.
The first argument, expr
, can be a string literal, a string constant, or the name of a field passed in from a stream. The second argument, varfields
, can define a tuple using the tuple() function or be a named schema or parameter specifying a tuple. Typical usage is to provide the data types and field names
that are to be used in the supplied expression, like:
tuple(double() as factor)
More generally, if the context that the function wil execute in has the same fields as the current input fields, you can specify all current field names:
tuple(* as *)
Always provide the third argument, typespec
, to define the data type that the function returns, which can be any StreamBase data type, such as int(), double(), list(),
string(), or even function().
As an example, compile a function that converts temperature in degrees Fahrenheit to degrees Celsius:
compile( "(fahr_temp - 32) * 1.8", tuple(double() as fahr_temp), double() )
The second syntax adds string name
, an optional name for the function. Specify a name if the function is recursive, so that it can refer to itself. StreamBase
expression language identifier naming rules apply, and the scope of the identifier is the function itself.
The third syntax adds tuple environment
, which provides a set of fields that the function expression can use as names and values for local variables. When using
this syntax, you can specify an empty name
argument ("") if it is not referenced within the function body.
You can define an expression within a stream's schema as a string and use it to define a function. For example, the input
stream could contain a string field named converter with the value "(fahr_temp - 32) * 1.8"
. The above example could then be:
compile( converter, tuple(double() as fahr_temp), double() )
You could generalize the above example to convert Fahrenheit temperatures to either Celsius or Kelvin units by specifying
an environment
variable, which could either be defined on the input stream or as a constant. If the value of that variable is 0.0, the function
returns degrees Celsius. If its value is 273.15, the function returns degrees Kelvin. Defining converter as "(fahr_temp - 32) * 1.8 + k_offset"
and providing name
and environment
arguments changes the call to compile to:
compile( converter, tuple(double() as fahr_temp), double(), "", k_offset )
When environment
is defined as (double(273.15) as k_offset) the function returns degrees Kelvin. When it is defined as (double(0.0) as k_offset),
the function returns degrees Celsius.
Function syntax:
T eval(string expr, T typespec)
T eval(string expr, tuple varfields, T typespec [, int cachesize
])
Evaluates the expression provided as the first argument and returns the result, whose data type is provided in the typespec
argument. The expression must resolve to the data type specified as typespec
; the function does not coerce the return's type.
The typespec
argument is evaluated only for its data type, and its value is ignored. You can specify literal values, as long as their
data type is unambiguous (1, 1L, 1.0, "string", and so on); or specify the null value function for the expected data type
(int(), long(), double(), string(), and so on); or provide an expression that unambiguously resolves to a value with the expected
data type.
For the two-argument form, the expr
argument is a simple StreamBase expression that does not depend on any variables for its resolution. For the three-argument
form, the expr
argument can reference fields in the second argument's tuple.
The function returns an evaluation exception under the following circumstances:
-
If you provide a
typespec
argument with a different data type than actually returned from the evaluation of the first argument. -
The expression in the
expr
argument fails to pass typechecking or resolves to null. -
In the three-argument form, you provide a null tuple for the second argument.
In any field that takes expressions, you can, of course, enter any valid expression to be evaluated without needing to use eval(). The eval() function provides a way for you to build an expression string dynamically or have it based on input:
eval(input.exprfield, int())
The three-argument form can be used to pass an entire input tuple or table row for evaluation against an expression:
eval("price > 200.00", input, bool())
eval("remainingQuantity == 0", current, bool())
The expression in the expr
argument is evaluated in a self-contained environment. Anything the expression needs for its evaluation must be passed in
by means of the varfields
argument. For example, to reference a constant or dynamic variable in the expr
argument, make sure that constant or dynamic variable is included in the input tuple. Let's say a module defines the constant
discount
, and has a dynamic variable named curr_rate
. You want to multiply field price
in the input tuple by both values. Use the tuple() function to create a tuple consisting of the entire input tuple, plus
the constant and variable:
eval("input.price * discount * curr_rate", tuple(input.* as *, discount, curr_rate), double())
The optional fourth argument is an advanced setting that specifies the size of the evaluation cache as the integer number
of unique compiled evaluations to be saved in the cache. In the absence of this argument, the evaluation cache is the same
as the current setting of the system property streambase.codegen.eval-cache-size
, which defaults to 100 for StreamBase Server and to 1000 for LiveView Server. The cache setting is container-wide, and setting
it during runtime can only increase the cache, and never shrink it. This prevents a single use of eval() at runtime from reducing
the cache size for everyone.
This argument must be in the fourth position; to use it with the two-argument form of this function, add an empty tuple for the second argument. (Remember that an empty tuple is not the same as a null tuple, which is disallowed for the second argument; see Null Tuples, Empty Tuples, No-Fields Tuples.)
This function is not expected to work at the command prompt with sbd --eval.
Function syntax:
string fromCodePoint(int value
)
Accepts an integer value and returns the corresponding ASCII character.
Function syntax:
int hash(T arg
)
Returns a hashed integer value derived from the argument, which can be any StreamBase data type.
Function syntax:
bool isnull(T value
)
Accepts all data types. Returns true if the argument is a null value.
For list and tuple arguments, isnull() returns true if and only if the top-level list or tuple is null. If a list contains any elements at all, even if all their values are null, it is not a null list. If a tuple contains any fields at all, it is not null even if all field values are null.
Function syntax:
string jsonpath(stringjson
, stringjsonpath
)
Accepts a JSON-formatted string and a JSONPath expression and returns the result of applying the jsonpath
expression as a filter on the input JSON string. JSONPath expressions are XPath-like pattern strings originally defined in JavaScript. This function is based on the Java implementation of JSONPath. See either of these pages for instructions on crafting JSONPath expressions.
For example, given the following JSON document:
{
"store": {
"book": [
{
"category": "fiction",
"author": "Herman Melville",
"title": "Moby Dick",
"ISBN": "0-553-21311-3",
"price": 8.99
},
{
"category": "fiction",
"author": "J. R. R. Tolkien",
"title": "The Lord of the Rings",
"ISBN": "0-395-19395-8",
"price": 22.99
}
]
}
}
and passing the JSONPath string $.store.book[1].title
as the second argument, the result is:
(string) ["The Lord of the Rings"]
These input arguments work at the command prompt with sbd -e jsonpath(...)
, but remember to escape the quotes with \"
.
Function syntax:
int length(blobx
) int length(listL
) int length(stringx
)
Returns the length of its argument for argument types blob, list, and string. For a blob, returns a count of bytes. For a non-null list, returns the number of elements in the list, including any null elements. For a string, returns the number of graphemes in the string, for ASCII and all supported Unicode character sets.
If the argument is a null blob, null list, or null string, the returned value is null
.
Function syntax:
tuple new_tuple(tuplet
[,updated_value1
ASfield1_name
[,updated_value2
ASfield2_name
[, ... [,updated_valueN
ASfieldN_name
]]]])
Use the new_tuple()
function to modify the values of a small number of fields in a tuple, leaving the other fields unchanged.
The required first argument t
is a tuple value such as a named schema or the name of a tuple field in the incoming stream. Use one or more field update
arguments of the form value AS field_name
. You can use input.* AS *
to fill in the fields in t
with input stream fields that have matching field names.
You do not need to specify all fields in t
, but you cannot specify fields not in t
.
The following example both defines a tuple and immediately replaces the value of its x
field. This example is impractical but can illustrate new_tuple()
using sbd --eval:
new_tuple(tuple(1.0 AS x, 2.0 AS y, 3.0 AS z), 5 AS x)
For a more practical example, an input stream may have a schema defined as (Symbol string, Price double, NumSh double)
. The following expression in a Map operator's Additional Expressions grid inflates the incoming value of the Price field by 10%:
Action | Field Name | Expression |
---|---|---|
Add | InflatedInput | new_tuple(input, input.Price*1.10 as Price) |
Function syntax:
tuple new_tuple_subset(tuplet
,value1
ASfield1_name
[,value2
ASfield2_name
[, ... [,valueN
ASfieldN_name
]]])
This function returns a tuple with the schema of the tuple argument t
, and with the specified new field values substituted.
The required first argument t
is a tuple value such as a named schema or the name of a tuple field in the incoming stream. Use one or more field update
arguments of the form value AS field_name
. You can use input.* AS *
to fill in the fields in t
with input stream fields that have matching field names.
Unlike new_tuple()
, this function allows but ignores any update field names not in tuple t
.
Unlike new_tuple()
, this function requires all field names in t
to be specified.
Let's say an input stream has a schema defined as (Symbol string, Price double, NumSh double, CustID int)
, while the named schema Trade has a subset of that schema: (Symbol string, Price double, NumSh double)
. The following expression in a Map operator's Additional Expressions grid accepts the four-field input tuple and extracts only the three-field trade basics into a new field named TradeOnly.
Action | Field Name | Expression |
---|---|---|
Add | TradeOnly | new_tuple_subset(Trade(), input.* AS *) |
Function syntax:
tuple new_tuple_subset_loose(tuplet
,updated_value1
ASfield1_name
[,updated_value2
ASfield2_name
[, ... [,updated_valueN
ASfieldN_name
]]])
This function returns a tuple with the schema of the tuple argument t
, and with the specified new field values substituted.
The required first argument t
is a tuple value such as a named schema or the name of a tuple field in the incoming stream. Use one or more field update
arguments of the form value AS field_name
. You can use input.* AS *
to fill in the fields in t
with input stream fields that have matching field names.
Unlike new_tuple()
, this function allows but ignores any update field names not in tuple t
.
Like new_tuple()
, this function does not require all field names in t
to be specified.
This function is especially useful when using a named schema to specify a declared output stream schema. In a Map operator
immediately upstream of the output stream, use this function to reconcile a varying set of input fields to the output stream's
declared schema. If the output stream's declared schema is named TargetSchema
, use an Additional Expressions grid like the following:
Action | Field Name | Expression |
---|---|---|
Add | TempTuple | new_tuple_subset_loose(TargetSchema(), input.* AS *) |
Then add a second Map to expand TempTuple's fields back to the top level:
Action | Field Name | Expression |
---|---|---|
Add | * | TempTuple.* |
As an alternative, you can collapse both Map operators into one that uses the following:
Action | Field Name | Expression |
---|---|---|
Add | * | new_tuple_subset_loose(TargetSchema(), input.* AS *).* |
This setup allows you to change the definition of TargetSchema
elsewhere in a large application without having to also change the preceding Map operator.
Function syntax:
bool notnull(T value
)
Returns true if the argument is not a null value. Accepts all data types.
This function always returns the opposite of isnull()
.
Function syntax:
T nullif(Targ1
, Targ2
)
Takes two arguments of any type (the same type for both). If the resolved value of both arguments is the same, returns null. If the resolved value of the two arguments differs, returns the value of the first argument. This function can be used to evaluate fields before and after an event, and to send a field upstream only if it has changed, otherwise null.
Function syntax:
tuple parsecsv(tupletuplespec
, tupleoptions, stringcsv-string
)
Accepts a string formatted as comma-separated values (CSV) and returns a tuple. It takes three arguments:
-
tuplespec
- a specification for the returned tuple's schema, in the form of a named schema constructor function, or the tuple() function, or an expression that resolves to a schema definition -
tupleoptions
- a tuple defining one or more of the processing options described in the following table, ornull
if none are passed in -
csv-string
- A string with a list of items delimited by commas (default) or by the delimiter character you specified in tupleoptions
The tuple options are:
Option Name | Description | Data Type | Default Value |
---|---|---|---|
delimiter | Delimiter character separating individual fields | string | , (comma) |
quotechar | Quote character for string fields | string | " (double quote) |
header | A CSV string, which may be optionally used to specify the order of tuple fields in the inputs to the function. | string | The order of the fields in the output Schema type |
nullstring | String that specifies what null values in the input CSV string look like | string | "null" (Tuple.NULL_STRING) |
timestampformat | Timestamp format to be used to parse timestamp fields in the input strings | string | yyyy-MM-dd hh:mm:ss.SSSZ |
lenientTimestampParsing | Whether to attempt to parse timestamps using the default datetime format ("yyyy-MM-dd hh:mm:ss.SSSZ") or interval format ("ss.SSS") when a timestamp field cannot be parsed using the specified timestampformat | boolean | true |
No value is set for fields that do not have a value specified in the input string. If an input string has more fields than the field order specifies, the additional fields are ignored with a warning.
You can specify how values in CSV string inputs are ordered using the header option. The header field names must match those in the output schema. If the header contains any field names that are not in the schema, those fields are ignored, and a debug level log message is issued. If the header option is not specified, input values must be ordered as specified by the schema in the first argument.
Specify sub-tuples as escaped CSV strings (using \"
or the designated quote character). There is no option to specify the field order within fields that are tuples themselves.
See second example below.
Example (can be run with sbd --eval
; line broken for publication clarity):
parsecsv(tuple ("a" AS FirstName, "b" AS LastName, "c" AS StreetNum, "d" AS StreetName, 1 AS Zip, "e" AS email), null, "John,Doe,30,Main Street,43210,jdoe666@myisp.com")
Output:
((FirstName string, LastName string, StreetNum string, StreetName string, Zip int, email string)) John,Doe,30,Main Street,43210,jdoe666@myisp.com
Note
All three input arguments (tuple, tuple, string) are required. The tuple options can be null
but must be included.
The same example with street address formatted as a sub-tuple with delimiter set to semicolon:
sbd --eval 'parsecsv(tuple ("a" AS FirstName, "b" AS LastName, tuple ("c" AS StreetNum, "d" AS StreetName, 1 AS Zip) AS Address, "e" AS email), tuple(";" AS delimiter), "John;Doe;\"30;Main Street;43210\";jdoe666@myisp.com")
Output:
((FirstName string, LastName string, Address (StreetNum string, StreetName string, Zip int), email string)) John,Doe,"30,Main Street,43210",jdoe666@myisp.com
Function syntax:
tuple parsejson(tupletuplespec
, stringjson-string
[, stringtimestampformat
])
Accepts a JSON-formatted string, in either JSON object or JSON array format (as defined on http://www.json.org) and returns a tuple. The first argument, tuplespec
, is a specification for the returned tuple's schema, in the form of a named schema constructor function, or the tuple() function, or an expression that resolves to a schema definition.
Use the optional third argument to specify a timestamp format with which to parse the JSON string, using format patterns in
the manner of java.text.SimpleDateFormat. In the absence of a third argument, timestamps are parsed using the default format string yyyy-MM-dd HH:mm:ss.SSSZ
.
For example, for the named schema point(), defined as (int x, int y)
, the following expressions both return the tuple (x: 88, y: 99)
.
parsejson(point(), "[88, 99]")
parsejson(point(), "{x:88, y:99}")
Function syntax:
tuple parsejson_loose(tupletuplespec
, stringjson-string
[, stringtimestampformat
])
Works similarly to parsejson()
, but ignores json elements that do not fit the schema definition provided. Accepts a JSON-formatted string, in either JSON
object or JSON array format (as defined on http://www.json.org) and returns a tuple. The first argument, tuplespec
, is a specification for the returned tuple's schema, in the form of a named schema constructor function, or the tuple() function, or an expression that resolves to a schema definition.
Use the optional third argument to specify a timestamp format with which to parse the JSON string, using format patterns in
the manner of java.text.SimpleDateFormat. In the absence of a third argument, timestamps are parsed using the default format string yyyy-MM-dd HH:mm:ss.SSSZ
.
For the named schema point(), defined as (int x, int y)
, parsejson_loose()
returns the tuple (x: 88, y: 99)
from both the following expressions.
parsejson(point(), "[88, 99]")
parsejson(point(), "{x:88, y:99}")
In addition, it would return the same result from a JSON string such as {x:88, y:99, z:101}
, ignoring the extra element.
In the following example, the initial JSON element z
is ignored, and the missing tuple element z
is set to null, to return the tuple 1,"2,""null,3""
" (line break added for clarity):
parsejson_loose(tuple('' AS x, tuple('' AS y, tuple ( '' AS z, '' AS t) AS q) AS r),
\"{z:'a', x:'1', r:{y:2, q:{t:'3'}}}\")
Function syntax:
tuple random_tuple(schema-definition
)
Given a schema definition, usually in the form of a named schema constructor function, returns a tuple with each field filled with random data of the correct data type for each field. For example, for the named
schema, point
, with schema (double x, double y, double z)
, the named schema constructor function point()
creates a single tuple with that schema. Use point()
as the schema definition argument for random_tuple():
random_tuple(point())
This returns three double values selected at random, any of which might be null. Numeric values returned can be fractional or negative. Each time this function is run, another three random values are returned.
You can also use random_tuple() using tuple() to define a schema. In this case, you must provide value arguments and field names, as in the following example:
random_tuple(tuple(132.34 as price, "IBM" as symb, now() as tdate))
This at first appears to be a contradiction, because you provide values to fields that you then ask random_tuple() to replace with new, random values. However, in this context, the values you provide to tuple() are evaluated by random_tuple() only to determine the data type for each field. For this example, random_tuple() returns three random fields of type double, string, and timestamp.
Function syntax:
int toCodePoint(stringcharacters
[, intindex
])
Accepts a string of one or more characters and returns the corresponding ASCII code. If more than one character is provided
with no index
argument, the code corresponding to the first character is returned. You can also provide a zero-based index
value to specify which character in the provided string whose code you want returned.
Function syntax:
string tojson(tuplefieldname
[, boolverbosity
] [, stringtimestampformat
])
Accepts a tuple argument and returns the tuple's field values as a JSON object of key-value pairs, or optionally as a JSON
array of field values. The argument fieldname
is likely to be a field of type tuple in the incoming stream, but can be any expression that resolves to a single tuple.
Use the optional third argument to specify a timestamp format with which to parse timestamp fields in the incoming tuple,
using format patterns in the manner of java.text.SimpleDateFormat. In the absence of a third argument, timestamps are parsed using the default format string yyyy-MM-dd HH:mm:ss.SSSZ
. When specifying the third argument, the verbosity
second argument is no longer optional, and must be explicitly set.
By default, or with the optional verbosity
argument expressed as true
, the input tuple's field and value pairs are returned as a string formatted as a JSON object, as defined on http://www.json.org: a left brace followed by colon-separated name-value pairs (with field name quoted), and each pair separated by a comma,
followed by a closing right brace.
Set the second argument to false
to return a JSON array of field values instead: a left bracket following by comma-separated field values, followed by a closing right bracket.
For example, let's say an incoming tuple field named trade
has the following schema and values:
Field | Data Type | Value |
---|---|---|
A | double | 2.0 |
B | int | 333 |
C | string | "IBM" |
D | list(int) | range(0, 8, 2) = [0, 2, 4, 6] |
In this case, the expressions tojson(trade)
and tojson(trade, true)
return the following string:
{"A":2.0,"B":333,"C":"IBM","D":[0,2,4,6]}
To test this function at the command prompt with sbd --eval, use a construction like the following:
sbd --eval "tojson(tuple(2.0 as A, 333 as B, 'IBM' as C, range(0, 8, 2) as D))"
The expression tojson(trade, false)
returns the following string:
[2.0,333,"IBM",[0,2,4,6]]
Function syntax:
string uuid()
Returns a pseudo randomly generated 128-bit universally unique identifier string in the manner of java.util.UUID.randomUUID().
Aggregate functions are used on sets of data to return a single result. Aggregate functions evaluate columns of data from windows or tables. In EventFlow applications, aggregate functions can only be used in the following aggregate contexts:
-
In Aggregate operators, in aggregate expressions that apply to an aggregate window.
-
In Query operators that perform Read or Delete operations, in expressions that apply to all rows read.
-
In Query operators that perform Read or Delete operations, in aggregate expressions used in conjunction with a Group-By field.
In StreamSQL applications, aggregate functions can be used in SELECT statements related to aggregate or query read operations.
Expressions in aggregate contexts can contain a single aggregate function, or they can combine multiple aggregate functions together with constants and fields from an input stream or an associated Query Table. Aggregate expressions can include simple functions if the value of the simple function does not change over the lifetime of the Aggregate window.
Aggregate functions are organized into the following categories:
This category includes the following functions:
aggregatelist() |
aggregatelistnonnull() |
avg_list() |
concat() |
count_list() |
join(), joinlist() |
rank() |
sum_list() |
unique() |
Function syntax:
list(T) aggregatelist(T e)
Returns a list containing one element for each expression, e
, in the Aggregate context . The list element type, T, of the returned list is the same as the data type of e
.
Back to Top ^ Back to Aggregate Overview ^
Function syntax:
list(T) aggregatelistnonnull(T e)
Returns a list containing one element for each expression, e
, in the Aggregate context. The list element type, T, of the returned list is the same as the data type of e
. Unlike aggregatelist()
, this function excludes nulls from the constructed list.
Back to Top ^ Back to Aggregate Overview ^
Function syntax:
list(T) avg_list(list(T) L
)
Returns a list containing the element-wise average of the elements of the field of type list, L
, in the Aggregate context. The list element type, T, of the returned list is the same as the element type of the argument
list, or a coerced superset type. For example, an Aggregate operator might specify avg_list(intlistfield)
as an aggregation expression. With the following input tuples:
{intlistfield: [1, 2, 3], otherfield: x} {intlistfield: [4, 5, 6], otherfield: y}
When the Aggregate operator emits, the result is:
[2.5, 3.5, 4.5]
The length of component lists can vary. Short lists are padded with nulls to match the length of the longest list seen in the current context. Null values are ignored for the computation of the average.
See also the avg_list() function for list elements.
Back to Top ^ Back to Aggregate Overview ^
Function syntax:
list(T) concat(list(T) L
)
Returns a list containing the elements of the field of type list, L
, in the Aggregate context. The list element type, T, of the returned list is the same as the element type of the argument
list.
See also the concat() function for list elements.
Back to Top ^ Back to Aggregate Overview ^
Function syntax:
list(int) count_list(list(T) L
)
Returns a list containing the element-wise count of the elements of the field of type list, L
, in the Aggregate context, returning a list of type int. For example, an Aggregate operator might specify count_list(intlistfield)
as an aggregation expression. With the following input tuples:
{intlistfield: [1, 2, 3], otherfield: x} {intlistfield: [4, 5, 6, 7, 8], otherfield: y}
When the Aggregate operator emits, the result is:
[2, 2, 2, 1, 1]
The length of component lists can vary. Short lists are padded with nulls to match the length of the longest list seen in the current context. Null values are ignored for the computation of the count.
See also the simple function count_list().
Function syntax:
string join(stringseparator
, stringS
) string joinlist(stringseparator
, stringS
)
Returns a string formed by concatenating string field S
in the Aggregate context. The separator
string is inserted between each string. If the separator
string is empty, no separators are inserted. If the separator
string is null, join() returns null.
You can use joinlist()
as an alias for join()
in all contexts. In StreamSQL contexts, which has the reserved word JOIN, and in LiveView LiveQL, which partly inherits its
parser from StreamSQL, you must either use the joinlist()
alias, or use escaped identifier syntax.
Also see the simple function join().
Back to Top ^ Back to Aggregate Overview ^
Function syntax:
list(double) rank(double field
)
Returns a list containing the rank ordering of values of a field of type double in the Aggregate context, in ascending order.
The input type must be double or be cast to double. If the input field contains duplicated values, the rank returned for each
of them is all the average of their ranks. For example, if a field called priority contained the values 5, 1, 6, 2
in the aggregate window, then rank(priority)
returns [3, 1, 4, 2]
. If the values for priority instead were 5, 2, 6, 2
, then rank(priority)
returns [3, 1.5, 4, 1.5
].
Back to Top ^ Back to Aggregate Overview ^
Function syntax:
list(T) sum_list(list(T) L
)
Returns a list containing the element-wise sum of the elements of the field of type list, L
, in the Aggregate context. The list element type, T, of the returned list is the same as the element type of the argument
list. For example, an Aggregate operator might specify sum_list(intlistfield)
as an aggregation expression. With the following input tuples:
{intlistfield: [1, 2, 3], otherfield: x} {intlistfield: [4, 5, 6], otherfield: y}
When the Aggregate operator emits, the result is:
[5, 7, 9]
The length of component lists can vary. Short lists are padded with nulls to match the length of the longest list seen in the current context. Null values are ignored for the computation of the sum.
See also the sum_list() function for list elements.
Back to Top ^ Back to Aggregate Overview ^
Function syntax:
list(T) unique(Tfield
, intnumuniq
)
Returns a list containing the unique values in the specified field
in the Aggregate context. The data type of the returned list is the same as that of the field
.
The unique() function maintains an exact list of unique values until numuniq
is reached, and does not report changes as the number of values added exceeds numuniq
. If a value in the reported unique list is removed, it is replaced by some other unique value, if one exists. If there are
more than numuniq
values, there is no guarantee which numuniq
values are reported.
Also see the simple function unique().
This category includes the following functions:
Function syntax:
T callcpp(stringclass
[,arg1
, ...,argn
])
As an alternative to using callcpp()
, you can define an alias for your custom functions. See Using Function Aliases.
Use an alias or callcpp()
to run a custom C++ aggregate function directly from a StreamBase operator that uses an expression. The return type, T, of
callcpp()
is the same as the return type of the called function. Custom C++ aggregate functions are functions you build with the StreamBase
C++ Client API, specifically those that inherit from the sb::PluginAggregate
class. You can use the aggregate form of callcpp()
in any aggregate expression. (To use callcpp()
in simple expressions, refer to the simple callcpp().)
For callcpp()
to locate the function being called, you must specify the location of the containing DLL or .so
file, and must register the function with a <custom-function>
element in the StreamBase Server configuration file. You specify the function as an aggregate function by means of the type
attribute of <custom-function>
. See C++ Function Overview for details.
To learn about coding custom C++ functions, refer to Creating Custom C++ Functions in the API Guide.
Back to Top ^ Back to Aggregate Overview ^
Function syntax:
T calljava(stringclass
[,arg1
, ...,argn
])
As an alternative to using calljava()
, you can define an alias for your custom functions. See Using Function Aliases.
Use an alias or calljava()
to run a custom Java aggregate function directly from a StreamBase operator that uses an expression. The return type, T,
of calljava()
is the same as the return type of the called function. Custom Java aggregate functions are functions you build with the StreamBase
Java Client library, specifically those that extend the AggregateWindow
class. You can use the aggregate form of calljava()
in any aggregate expression. (To use calljava()
in simple expressions, refer to the simple calljava().)
The calljava()
function distinguishes simple from aggregate functions by the number of arguments. For custom simple functions, you must
specify both class and method names. For custom aggregate functions, you specify only the class name, because these functions
must inherit from AggregateWindow
, which has an init()
method that is automatically run.
To learn about coding custom Java functions, refer to Using the StreamBase Java Function Wizard in the API Guide. For information on the classpath requirements for custom Java functions, see Java Function Overview.
See Return Types and Argument Types for a discussion of using Java primitives or Java objects in the function you write to be called with calljava()
.
This category includes the following functions:
Function syntax:
double alpha(doubleindex
, doubleprice
, doubledividend
)
Returns a double that may indicate the part of a stock's movement that is independent of the index's movement.
Examples of stocks increasing in alpha could be those with takeover rumors, under strong syndicate manipulation, or having strong expectations of good results; that is, factors that make them increasingly move independently of the index.
The beta of a stock is defined as the slope of a regression line in a scatter graph of paired data points representing percentage
changes of an index (return of an index) and the corresponding change (return of stock) in the price of a stock. The StreamBase
beta()
function also includes the stock dividend in the calculation of the stock return.
The alpha is the point where this regression line cuts the Y axis. To reiterate, a stock's beta can be described as that part of a stock's movement that is influenced by the index, while a stock's alpha can be regarded as that part of a stock's movement that is independent of the index's movement.
The alpha
function's arguments are:
-
index
: the end-of-period market index (double) -
price
: the end-of-period stock price (double) -
dividend
: the stock dividend of the period (double)
The stock price must have already figured in any stock split (or reverse split). The alpha is usually calculated over a period of 61 months. This aggregate function assumes that the input data is already normalized. That is, index, price, and dividend are of the same period. If they are of different periods, first normalize the yield (return), and then use the intercept function to calculate alpha.
Back to Top ^ Back to Aggregate Overview ^
Function syntax:
double avg(doublefield
) int avg(intfield
) long avg(longfield
) timestamp avg(timestampfield
)
Returns the average value of a numeric field, field
, in all tuples in the Aggregate context. The field's type can be a double, int, long, or timestamp. For a returned average
of timestamp values to make sense, the aggregate field must contain all interval timestamps or all absolute timestamps, but
not both.
See also the avg() function for lists.
Back to Top ^ Back to Aggregate Overview ^
Function syntax:
int avgif(Tfieldname
, boolpredicate
)
Returns the average of the values for which the specified numeric field is not null and for which the predicate evaluates to true. The predicate expression can be any valid expression on any field, and is not restricted to values from the specified field.
For example: avgif(input.income, input.age < 35)
Back to Top ^ Back to Aggregate Overview ^
Function syntax:
bool andall(bool f
)
The argument f
represents a field of type boolean in an Aggregate operator, a Query Table column, or an expression that depends on one or
more fields in an Aggregate operator or Query Table.
This function evaluates the values in the specified field, and returns the results of a logical AND operation on all values.
For example, suppose field BoolField
contains the following values in successive rows of an Aggregate window or Query Table column: true, true, false, true, false
. In that case, andall(BoolField)
returns false
. Use andall() as an aggregate truth detector function.
Null arguments have special handling. A null
argument does not change the result of the evaluation if a false
value is among the arguments. Thus, if BoolField
contains null, true, false, null, true
, then andall(BoolField)
still returns false
. However, if one or more arguments is null while all other arguments are true
, the function returns null
. So if BoolField
contains null, true, true, null, true
, then andall(BoolField)
returns null
.
The andall() function follows the logic expressed in this statement:
if any argument is false, return false
else if any argument is null, return null
else return true
See also the orall() aggregate function and the simple versions of andall() and orall().
Function syntax:
double beta(doubleindex
, doubleprice
, doubledividend
)
Returns a double that may indicate the tendency of a security's returns to respond to swings in the market. A beta of 1
indicates that the security's price will move with the market. A beta of less than 1 means that the security will be less
volatile than the market. A beta value that is greater than 1 indicates that the security's price will be more volatile than
the market.
For related information, see the description of the alpha function.
Beta is a measure of a security's or portfolio's volatility, or systematic risk, in comparison to the market as a whole. For
example, if a stock's beta is 1.2
, it is theoretically 20% more volatile than the market.
The function's arguments are:
-
index
: the end-of-period market index (double) -
price
: the end-of-period stock price (double) -
dividend
: the stock dividend of the period (double)
The stock price must have already figured in any stock split (or reverse split). The beta is usually calculated over a period of 61 months. This aggregate function assumes that the input data is already normalized. That is, index, price, and dividend are of the same period. If they are of different periods, the yield should be normalized first and then use the slope function to calculate beta.
Back to Top ^ Back to Aggregate Overview ^
Function syntax:
double correlation_coefficient(doubleprice
, doubleindex
) double correlation_coefficient(intprice
, intindex
) list(double) correlation_coefficient(stringname
, intindex
, doublefield1
, doublefield2
, ...)
The first two syntaxes return the Pearson correlation coefficient for two fields, price
and index
, for all tuples in the Aggregate context. This function takes two input arguments (both can be an int or a double) and returns
a double. The correlations coefficient is a measure that determines the degree to which two variable's movements are associated.
The correlation coefficient varies from -1.0
to 1.0
. The value -1.0
indicates perfect negative correlation, and 1.0
indicates perfect positive correlation. 0
means no correlation.
Note
-
The correlation_coefficient of 0 items is null.
-
The correlation_coefficient of 1 item is 1.0.
-
The correlation_coefficient of N identical items is 1.0.
The third syntax, list(double) correlation_coefficient(string name
, int index
, double field1
, double field2
...), returns a list of correlation coefficients, containing as many members as there are field arguments (there must be
at least two). The output list of doubles is in the same order as the field arguments. Given n
field names, the index argument specifies which of those n
fields holds the primary variable (for which field the returned correlation coefficient is 1.0). The value of index must
not change over the life of the operator.
The name argument specifies the type of correlation coefficient to compute, taken from this list:
Click any name above to see the description of the statistic. You can abbreviate and approximate the spellings you supply for the name, which must not change over the life of the operator. You can call any of these functions directly by its full name, as well as using the above syntax. However, when you call these functions directly, only two variables can be correlated per call and the output is a scalar double.
If any input field contains null values, the function returns null for its correlation coefficient.
Back to Top ^ Back to Aggregate Overview ^
Function syntax:
double correlation_coefficientp(doubleprice
, doubleindex
) double correlation_coefficientp(intprice
, intindex
)
Returns the correlation coefficient for two fields, price
and index
, for all tuples in the Aggregate context. The correlation_coefficientp
function is similar to the correlation_coefficient
function, but use correlation_coefficientp
when the data provided is the entire population, while the correlation_coefficient
function is used for a random sample. The correlation_coefficientp
function is calculated using the biased (or n) method. The correlation_coefficient
function is calculated using the unbiased (or n-1) method.
This function takes two input arguments (both can be an int or a double) and returns a double. The correlations coefficient
is a measure that determines the degree to which two variable's movements are associated. The correlation coefficient varies
from -1.0
to 1.0
. The value -1.0
indicates perfect negative correlation, and 1.0
indicates perfect positive correlation. 0 means no correlation.
Note
-
The correlation_coefficientp of 0 items is null.
-
The correlation_coefficientp of 1 item is 1.0.
-
The correlation_coefficientp of N identical items is 1.0.
Back to Top ^ Back to Aggregate Overview ^
Function syntax:
int count([Texpr
]) long countlong([Texpr
])
Returns the number of tuples in the Aggregate context. If this function is called with an argument, then tuples for which the argument evaluates to null do not contribute to the count. The argument can be of any data type, T, but its actual value is ignored, other than the fact that it is non-null. If no argument is specified, then all tuples are included in the count.
countlong()
works exactly as count()
does, except that it returns a long value rather than an int.
To clarify the operation of count()
, consider the following sequence of values for x:
x {1,2,3,4,5,6,7,8,9,10}
Calling count(x)
returns 10. There are ten items in the list.
Calling count(x > 5)
also returns 10. This is because it is counting the sequence {f,f,f,f,f,t,t,t,t,t} (five false, five true). All elements
in this sequence are non-null, so all are counted.
Calling
returns 5, because it is counting the sequence {null, null, null, null, null, 0, 0, 0, 0, 0} (five nulls, five zeros). The
five non-null elements are counted. Any non-null value other than 0 would also count in this expression.
count(if x > 5 then 0 else null)
The following example clarifies the count()
function's behavior with query tables.
Let's say we have a table named flattop with the following schema and contents:
Field Name | Data Type | Contents of Three Rows |
---|---|---|
alpha | int | 2, 4, 6 |
beta | long | 3.0, 5.0 |
The expression select count(flattop.*) from in, tablet ...
returns
3, 2
which is the number of non-null values in the alpha
field, plus the number of non-null values in the beta
field.
If you want a count of the number of items in the window, use an expression like the following: select count() as c from in, flattop ...
Back to Top ^ Back to Aggregate Overview ^
Function syntax:
int count_distinct(T arg1 [, T arg2 ...]])
Returns the number of tuples in the Aggregate context with specified values for the parameters. This function must be called with at least one argument of any data type. Multiple arguments can be of different types. Use commas to separate subsequent arguments. For the purpose of this function, null values of a field are considered equivalent to each other.
See also the count_distinct_elements() function for lists.
Back to Top ^ Back to Aggregate Overview ^
Function syntax:
int countif(boolpredicate
) int countif(Tfieldname
, boolpredicate
)
With one argument, returns the number of rows for which the specified predicate argument evaluates to true. When used with two arguments, it returns the number of rows for which the specified field is not null and the predicate evaluates to true.
Back to Top ^ Back to Aggregate Overview ^
Function syntax:
double covariance(doubleprice
, doubleindex
) double covariance(intprice
, intindex
) double covariance(longprice
, longindex
)
Returns the covariance for two fields, price
and index
, for all tuples in the Aggregate context. This function takes two input arguments (both can be a double, int, or long) and
returns a double. In financial applications, covariance can be used to measure the degree to which returns on two risky assets
move in tandem. A positive covariance means that asset returns move together. A negative covariance means returns vary inversely.
In the arguments, the price
field is a stock price, and the index
is the industry segment index.
One method of calculating covariance is to look at return deviations from the expected return in each scenario. Another method is to multiply correlation between the two variables by the standard deviation of each variable.
For example, if you owned one asset that had a high covariance with another asset that you did not own, then you would receive very little increased diversification by adding the second asset. Of course, the opposite is true as well, adding assets with low covariance to your portfolio would lower the overall portfolio risk.
Back to Top ^ Back to Aggregate Overview ^
Function syntax:
double covariancep(doubleprice
, doubleindex
) double covariancep(intprice
, intindex
) double covariancep(longprice
, longindex
)
Returns the covariance for two fields, price
and index
, for all tuples in the Aggregate context. The covariancep
function is similar to the covariance
function, but use covariancep
when the data provided is the entire population, while the covariance
function is used for a random sample. The covariancep
function is calculated using the biased (or n) method. The covariance
function is calculated using the unbiased (or n-1) method.
This function takes two input arguments (both can be a double, int, or long) and returns a double. In financial applications, covariance can be used to measure the degree to which returns on two risky assets move in tandem. A positive covariance means that asset returns move together. A negative covariance means returns vary inversely.
In the arguments, the price
field is a stock price, and the index
is the industry segment index.
One method of calculating covariance is by looking at return deviations from the expected return in each scenario. Another method is to multiply correlation between the two variables by the standard deviation of each variable.
For example, if you owned one asset that had a high covariance with another asset that you did not own, then you would receive very little increased diversification by adding the second asset. Of course, the opposite is true as well, adding assets with low covariance to your portfolio would lower the overall portfolio risk.
Back to Top ^ Back to Aggregate Overview ^
Function syntax:
double exp_moving_avg(doublevalue
, intcount
[, doubleweight
]) double exp_moving_avg(intvalue
, intcount
[, doubleweight
]) double exp_moving_avg(longvalue
, intcount
[, doubleweight
])
Returns a weighted average where a percentage of each successive value is used with a percentage of the existing average to construct the new average. The effect of any one value on the average diminishes exponentially as new values are added.
The function takes these arguments:
value
-
The value (double, int, or long) to be averaged.
count
-
The integer number of periods over which a simple average is used prior to starting the EMA.
weight
-
An optional weighting value (a double between 0 and 1) that is to be applied to both the current period's value (CPV) and prior period's average (PPA) when the Exponential Moving Average (EMA) is calculated.
If the weight is not between 0 and 1 or is not specified, it is calculated from the following formula:
weight = 2 / (count + 1)
This function returns a double. It is most commonly used with an unlimited aggregate window size (although it does not have
to be). When configuring an Aggregate operator for an unlimited window, do not configure the Advance and Size fields in StreamBase Studio, which results in unlimited window size. Instead, set the Emit field to 1
so that the aggregate emits for every tuple.
The
function is calculated using the following formula:
exp_moving_avg
EMA
= (CPV
*weight
) + (PPA
* (1 -weight
))
-
EMA
: exponential moving average so far -
PPA
: exponential moving average after the previous tuple -
CPV
: value of current tuple
Initially there is no PPA. A simple moving average (SMA) is computed instead over as many tuples as specified by the second parameter. When the SMA is computed, a Null value is emitted for each input tuple until the Aggregate operator receives enough tuples to compute the SMA. Following this, the EMA is calculated.
Back to Top ^ Back to Aggregate Overview ^
Function syntax:
double goodman_kruskal_gamma(double field1, double field2)
Returns the Goodman-Kruskal Gamma correlation coefficient between values of two fields in the Aggregate context as a double value when the window emits. The Goodman-Kruskal Gamma statistic is a rank correlation of concordant pairs. If any field values are null, the returned value is null.
To perform multivariate Goodman-Kruskal Gamma correlations, use the aggregate correlation_coefficient
function.
Function syntax:
double intercept(doublex
, doubley
) double intercept(intx
, inty
)
For the data set of fields x
and y
for all tuples in the Aggregate context, a best-fit linear regression line is calculated, then the function returns the value
where the line intersects the y axis. The method used to find the line that best fits a group of points is called least squares (or linear least squares). This function takes two input arguments of the same type (both can be an int or a double) and returns a double.
This function returns zero if the Aggregate window contains one row or duplicate rows containing the same values.
Back to Top ^ Back to Aggregate Overview ^
Function syntax:
double kendall_tau(double field1, double field2)
Returns the normalized Kendall Tau distance metric between values of two fields in the Aggregate context as a double value when the window emits. The Kendall Tau distance statistic counts the number of pairwise disagreements between the ranks of corresponding items in two fields. Larger values indicate greater disagreement in rank ordering, such that a return value of 0 means complete agreement and 1.0 means complete disagreement.
To perform multivariate Kendall Tau correlations, use the aggregate correlation_coefficient
function.
Function syntax:
T max(T f
)
The argument f
represents a field in an Aggregate operator, a Query Table column, or an expression that depends on one or more fields in
an Aggregate operator or Query Table.
This function returns the maximum non-null value using the greater-than relational operation appropriate for the data type
of f
. The data type, T, of the returned value is the same as the argument.
StreamBase data types are comparable with relational operators in different ways, as listed in the entry for each data type on StreamBase Data Types.
See also the simple version of max(), the aggregate function maxn() below, and the maxelement() function for lists. In an aggregate context, this aggregate function max(
returns the maximum list in the aggregate window. Use the simple function list
)max(
to determine the larger of two or more lists. By contrast, use list
, , list
)maxelement()
to compare the element values of a single list.
Back to Top ^ Back to Aggregate Overview ^
Function syntax:
list(T) maxn(intcount
, Tf
)
The argument f
represents a field in an Aggregate operator, a Query Table column, or an expression that depends on one or more fields in
an Aggregate context.
Returns a list with count
elements containing the count
number of maximum non-null values for field f
in the Aggregate context. Accepts fields of all data types and returns a list whose elements are the same as the type of
field f
. The value of count
must be nonnegative, cannot be null, and must remain the same for the duration of the Aggregate window. For count
=0, an empty Aggregate window, or an empty Query Table column, an empty list is returned.
Maximum values are determined with the greater-than relational operation appropriate for the data type of f
. StreamBase data types are comparable with relational operators in different ways, as listed in the entry for each data type
on StreamBase Data Types.
See also the simple version of max(), the aggregate version of max(), and the maxelement() function for lists.
Back to Top ^ Back to Aggregate Overview ^
Function syntax:
double median(doublex
) double median(intx
) double median(longx
)
Returns the median value of x
for this Aggregate context. If the window has an even number of elements, returns the average of the middle two.
See also the median() function for lists.
Back to Top ^ Back to Aggregate Overview ^
Function syntax:
T min(T f
)
The argument f
represents a field in an Aggregate operator, a Query Table column, or an expression that depends on one or more fields in
an Aggregate operator or Query Table.
This function returns the minimum non-null value using the less-than relational operation appropriate for the data type of
f
. The data type, T, of the returned value is the same as the argument.
StreamBase data types are comparable with relational operators in different ways, as listed in the entry for each data type on StreamBase Data Types.
See also the simple version of min(), the aggregate function minn() below, and the minelement() function for lists. In an aggregate context, this aggregate function min(
returns the minimum list in the aggregate window. Use the simple function list
)min(
to determine the smaller of two or more lists. By contrast, use list
, , list
)minelement()
to compare the element values of a single list.
Back to Top ^ Back to Aggregate Overview ^
Function syntax:
list(T) minn(intcount
, Tf
)
The argument f
represents a field in an Aggregate operator, a Query Table column, or an expression that depends on one or more fields in
an Aggregate context.
Returns a list with count
elements containing the count
number of minimum non-null values for field f
in the Aggregate context. Accepts fields of all data types and returns a list whose elements are the same as the type of
field f
. The value of count
must be nonnegative, cannot be null, and must remain the same for the duration of the Aggregate window. For count
=0, an empty Aggregate window, or an empty Query Table column, an empty list is returned.
Minimum values are determined with the less-than relational operation appropriate for the data type of f
. StreamBase data types are comparable with relational operators in different ways, as listed in the entry for each data type
on StreamBase Data Types.
See also the simple version of min(), the aggregate version of min(), and the minelement() function for lists.
Back to Top ^ Back to Aggregate Overview ^
Function syntax:
bool orall(bool f
)
The argument f
represents a field of type boolean in an Aggregate operator, a Query Table column, or an expression that depends on one or
more fields in an Aggregate operator or Query Table.
This function evaluates the values in the specified field, and returns the results of a logical OR operation on all values.
For example, suppose field BoolField
contains the following values in successive rows of an Aggregate window or Query Table column: true, true, false, true, false
. In that case, orall(BoolField)
returns true
. Use orall() as an aggregate falsity detector function.
Null arguments have special handling. A null
argument does not change the result of the evaluation if a true
value is among the arguments. Thus, if BoolField
contains null, false, true, null, false
, then orall(BoolField)
still returns true
. However, if one or more arguments is null
while all other arguments are true
, the function returns null
. So if BoolField
contains null, false, false, null, false
, then orall(BoolField)
returns null
.
The orall() function follows the logic expressed in this statement:
if any argument is true, return true
else if any argument is null, return null
else return false
See also the andall() aggregate function and the simple versions of andall() and orall().
Function syntax:
double pearson(double field1, double field2)
Returns the Pearson product-moment correlation coefficient between values of two fields in the Aggregate context as a double
value when the window emits that indicates the degree of linear dependence between two variables. The Pearson statistic is
defined as the covariance of the two variables divided by the product of their standard deviations. The term product-moment refers to the mean (the first moment about the origin) of the product of the mean-adjusted random variables. A return value
of -1
indicates a perfect negative linear correlation. A value of 0
indicates no correlation (random variates). A value of 1
indicates perfect positive linear correlation. If any field values are null, the returned value is null. If the variance
of either input variable is zero, the function returns NaN
.
To perform multivariate Pearson correlations, use the aggregate correlation_coefficient
function or the aggregate correlation_coefficientp
function.
Function syntax:
T percentile(Tfield
, doublepct
)
Returns the value of the specified field with percentile rank equal to the specified pct
value. If there is no value in the Aggregate context at exactly pct
percentile, then the value with the greatest percentile less than pct
is returned. The data type of the specified field
(and of the corresponding result) must be one of the comparable types in the expression language, as listed on the Data Types page.
Function syntax:
double product(doublef
) int product(intf
) long product(longf
)
Returns the multiplication product computed for field f
for all tuples in the Aggregate context. Supports data types double, int, and long.
See also the product() function for lists.
Back to Top ^ Back to Aggregate Overview ^
Function syntax:
double slope(doublex
, doubley
) double slope(intx
, inty
)
For the data set of fields x
and y
for all tuples in the Aggregate context, a best-fit linear regression line is calculated, and then the function returns the
line's slope. The method that is used to find the line that best fits a group of points is called least squares (or linear least squares). The slope
function takes two input arguments of the same type (both can be an int or a double) and returns a double.
This function returns zero if the Aggregate window contains one row or duplicate rows containing the same values.
The slope() function returns a unitless double value that does not represent an angle, and is therefore in neither degrees nor radians. The internal calculation takes the tan(theta), where theta is the angle of incline of the best linear fit to the data. This calculation returns the same value whether theta is internally expressed in degrees or radians.
Back to Top ^ Back to Aggregate Overview ^
Function syntax:
double spearmans_rank(double field1, double field2)
Returns the Spearman's Rank correlation coefficient between values of two fields in the Aggregate context as a double value
when the window emits. Spearman's statistic is defined as the Pearson correlation coefficient between the ranked variables.
A return value of -1
indicates a perfect negative correlation (reverse ordering). A value of 0
indicates no concordance of ranks. A value of 1
indicates identical rank orderings. If any field values are null, the returned value is null.
To perform multivariate Spearman's Rank correlations, use the aggregate correlation_coefficient
function.
Function syntax:
double stdev(doublef
) double stdev(intf
) double stdev(longf
) timestamp stdev(timestampf
)
Returns the standard deviation for field f
for all tuples in the Aggregate context. The function takes one input argument. If the input argument is an int, double,
or long, it returns a double. If the input argument is a timestamp, it returns an interval timestamp. For timestamp values,
the aggregate field should contain all interval timestamps or all absolute timestamps, but not both. For windows with size
< 2, returns null.
Standard deviation is a measure of the dispersion of a set of data from its mean. The more spread apart the data is, the higher the deviation. For example, in financial applications, standard deviation could be applied to the annual rate of return of an investment to measure the investment's volatility (risk). A volatile stock would have a high standard deviation. In mutual funds, the standard deviation indicates how much the return on the fund is deviating from the expected normal returns.
See also the stdev() function for lists.
Back to Top ^ Back to Aggregate Overview ^
Function syntax:
double stdevp(doublef
) double stdevp(intf
) double stdevp(longf
) timestamp stdevp(timestampf
)
Returns the standard deviation for field f
for all tuples in the Aggregate context. The function takes one input argument. If the input argument is an int, double,
or long, it returns a double. If the input argument is a timestamp, it returns an interval timestamp. For timestamp values,
the aggregate field should contain all interval timestamps or all absolute timestamps, but not both. For windows with size
< 2, returns 0.
The stdevp()
function is similar to the stdev()
function: with stdevp()
the data provided is the entire population, while with stdev()
, the data provided is treated as a random sample. The stdevp()
function is calculated using the biased (or n) method. The stdev()
function is calculated using the unbiased (or n-1) method.
See also the stdevp() function for lists.
Back to Top ^ Back to Aggregate Overview ^
Function syntax:
double sum(doublef
) int sum(intf
) long sum(longf
) timestamp sum(timestampf
)
Returns the sum of field f
computed for all tuples in the Aggregate context. The field can be an double, int, long, or timestamp.
When the field type is timestamp, summing members of the field follows the rules for adding timestamps as shown in the table in timestamp Data Type. That is, you cannot sum an aggregate of two or more absolute timestamps. However, you can sum an aggregate field composed of all interval timestamps, or one composed of exactly one absolute timestamp plus one or more interval timestamps.
See also the sum() function for lists.
Back to Top ^ Back to Aggregate Overview ^
Function syntax:
int sumif(Tfieldname
, boolpredicate
)
Returns the sum of the values for which the specified numeric field is not null and for which the predicate evaluates to true. The predicate expression can be any valid expression on any field, is not restricted to values from the specified field.
For example: sumif(input.income, input.age < 35)
Back to Top ^ Back to Aggregate Overview ^
Function syntax:
double variance(doublef
) double variance(intf
) double variance(longf
)
Returns the variance for field f
for all tuples in the Aggregate context. The function takes one input argument (an int, double, or long) and returns a double.
For windows with size < 2, returns null.
Variance is a measure of the dispersion of a set of data points around their mean value. It is a mathematical expectation of the average squared deviations from the mean. Variance measures the variability (volatility) from an average. Volatility is a measure of risk. So for example, this statistic can help determine the risk an investor might take on when purchasing a specific security.
See also the variance() function for lists.
Back to Top ^ Back to Aggregate Overview ^
Function syntax:
double variancep(doublef
) double variancep(intf
) double variancep(longf
)
Returns the variance for field f
for all tuples in the Aggregate context. The function takes one input argument (an int, double, or long) and returns a double.
For windows with size < 2, returns 0.
The variancep()
function is similar to the variance()
function: with variancep()
the data provided is the entire population, while with variance()
, the data provided is treated as a random sample. The variancep()
function is calculated using the biased (or n) method. The variance()
function is calculated using the unbiased (or n-1) method.
See also the variancep() function for lists.
Back to Top ^ Back to Aggregate Overview ^
Function syntax:
double vwap(double price, double volume)
Returns the volume-weighted average-price value from (typically) the price
and volume
values, for all tuples in the Aggregate context. Both arguments are doubles.
Back to Top ^ Back to Aggregate Overview ^
Function syntax:
T1 withmax(Tcompare_expr
, T1result_expr
)
Returns the result_expr
for this Aggregate context that has the maximum corresponding compare_expr
using the greater-than relational operator. The data type, T1, of the returned value is the same as the result_expr
. The data type of the compare_expr
can be any data type, not necessarily the same as the result_expr's
type.
StreamBase data types are comparable with relational operators in different ways, as listed in the entry for each data type on StreamBase Data Types.
Back to Top ^ Back to Aggregate Overview ^
Function syntax:
T1 withmin(Tcompare_expr
, T1result_expr
)
Returns the result_expr
for this Aggregate operator's window that has the minimum corresponding compare_expr
using the less-than relational operator. The data type, T1, of the returned value is the same as the result_expr
. The data type of the compare_expr
can be any data type, not necessarily the same as the result_expr's
type.
StreamBase data types are comparable with relational operators in different ways, as listed in the entry for each data type on StreamBase Data Types.
This category includes the following functions:
Function | Use in Aggregate Operator? | Use in Query Read on Query Tables? |
---|---|---|
closeval() | yes | no |
firstn() | yes | yes |
firstnonnullval() | yes | yes |
firstval() | yes | yes |
getOldestWindowID() | yes | no |
getWindowID() | yes | no |
isOldestWindow() | yes | no |
lag() | yes | yes |
lastn() | yes | yes |
lastnonnullval() | yes | yes |
lastval() | yes | yes |
openval() | yes | no |
All windowing functions work on columns of data in Aggregate windows, and most windowing functions also work on table columns returned by Query Read operations on Query Tables.
Note
When used with a Query operator, these functions are specified in the Output Settings tab of the Query operator's Properties view, and operate on the initial selection of rows from the Query Table as specified in the Query Settings tab for a Query-Read operation. Remember that the order of rows returned from the Query Table depends on the Read predicate expression and on the sort order, if specified.
For example, if the Query-Read operation specifies Read All Rows on a table whose primary index is a string field, and you specify running the firstn() or lastn() functions on that selection, those functions return fields from the first n
rows (or last n
rows) in alphabetical order by primary index.
The three functions getWindowID(), isOldestWindow(), and getOldestWindowID() are pseudo-aggregate functions in that they only operate in the context of an Aggregate operator, but they do not query or evaluate the tuples flowing through the Aggregate operator. Instead, they report on the state of the various windows within the Aggregate.
Function syntax:
double closeval([stringdimension
]) int closeval([stringdimension
]) long closeval([stringdimension]
) timestamp closeval([stringdimension
])
The closeval()
function behaves the same as openval() except that it returns the upper limit of the specified dimension when the window is closed. Like openval(), this function
works in Aggregate operators, and not in Query operators.
The value returned by closeval()
might differ from any of the actual tuple values included in the window. For example, consider an Aggregate operator with
a dimension CountDim
, where the windows created by that dimension include a window with a range of 5 to 10. If the actual tuple values in the
window are 6, 7, and 9, then closeval(CountDim)
returns 10
for that window, not 9
.
If the dimension name is omitted and the aggregate has only one dimension, then closeval()
uses that dimension. If the dimension name is omitted and the aggregate has more than one dimension, the result is a typecheck
error.
Back to Top ^ Back to Aggregate Overview ^
Function syntax:
list(T) firstn(intcount
, Tf
)
Returns a list with count
elements containing the first count
values for field f
in an Aggregate operator's window or in a Query Table column. Accepts fields of all data types and returns a list whose elements
are the same type as the type of field f
. Field f
can contain null values, and such fields are returned in the list.
The value of count
must be nonnegative, cannot be null, and must remain the same for the duration of the Aggregate window. For count
=0, an empty Aggregate window, or an empty Query Table column, an empty list is returned.
This function works best in conjunction with the Aggregate operator. See the note above about using this function with Query operators.
Back to Top ^ Back to Aggregate Overview ^
Function syntax:
T firstnonnullval(T f
)
Returns the first non-null value for field f
in the Aggregate operator's window or in a Query Table column. Accepts all data types and returns the same type as its argument.
This function works best in conjunction with the Aggregate operator. See the note above about using this function with Query operators.
Back to Top ^ Back to Aggregate Overview ^
Function syntax:
T firstval(T f
)
Returns the first value for field f
in the Aggregate operator's window or in a Query Table column. Accepts all data types and returns the same type as its argument.
This function works best in conjunction with the Aggregate operator. See the note above about using this function with Query operators.
Back to Top ^ Back to Aggregate Overview ^
Function syntax:
int getOldestWindowID()
Returns an integer that represents the ID of the oldest window still open of the current aggregation group.
This function works only in conjunction with the Aggregate operator.
Back to Top ^ Back to Aggregate Overview ^
Function syntax:
int getWindowID()
Returns an integer representing the ID of the current window. All windows of each Aggregate operator have a unique ID. The larger the value of the window ID, the younger the window is.
This function works only in conjunction with the Aggregate operator.
Back to Top ^ Back to Aggregate Overview ^
Function syntax:
bool isOldestWindow()
Returns true
if the window in which this function is called is the oldest of the current aggregation group.
This function works only in conjunction with the Aggregate operator.
Back to Top ^ Back to Aggregate Overview ^
Function syntax:
T lag(T f
)
Returns the next-to-last value for field f in the Aggregate operator's window or in a table column. If the window or table column has only one row or zero rows, lag() returns null. Accepts all data types and returns the same type as its argument.
This function works best in conjunction with the Aggregate operator. See the note above about using this function with Query operators.
Example usage in an aggregate context: avg(
where fieldname
/lag(fieldname
))fieldname
is a numeric value such as a price field.
Back to Top ^ Back to Aggregate Overview ^
Function syntax:
list(T) lastn(intcount
, Tf
)
Returns a list with count
elements containing the last count
values for field f
in an Aggregate operator's window or in a Query Table column. Accepts fields of all data types and returns a list whose elements
are the same type as the type of field f
. Field f
can contain null values, and such fields are returned in the list.
The value of count
must be nonnegative, cannot be null, and must remain the same for the duration of the Aggregate window. For count
=0, an empty Aggregate window, or an empty Query Table column, an empty list is returned.
This function works best in conjunction with the Aggregate operator. See the note above about using this function with Query operators.
Back to Top ^ Back to Aggregate Overview ^
Function syntax:
T lastnonnullval(T f
)
Returns the last non-null value for field f
in the Aggregate operator's window or in a Query Table column. Accepts all data types and returns the same type as its argument.
This function works best in conjunction with the Aggregate operator. See the note above about using this function with Query operators.
Back to Top ^ Back to Aggregate Overview ^
Function syntax:
T lastval(T f
)
Returns the last value for field f
in the Aggregate operator's window or in a Query Table column. Accepts all data types and returns the same type as its argument.
This function works best in conjunction with the Aggregate operator. See the note above about using this function with Query operators.
Back to Top ^ Back to Aggregate Overview ^
Function syntax:
double openval([stringdimension
]) int openval([stringdimension
]) long openval([stringdimension
]) timestamp openval([stringdimension
])
Can only be used in an Aggregate operator, not in a Query Operator. Returns the lower limit of the specified dimension for the current window. The dimension name is passed as a quoted string, and must match the name of a dimension in the Aggregate operator. This function works even if the dimension specified is not the one that actually opened the current window. In that case, the value returned is the value of the dimension at the time the window was created.
The value returned by openval
might differ from any of the actual tuple values included in the window. For example, consider an Aggregate operator with
a dimension CountDim
, where the windows created by that dimension include a window with a range of 5 to 10. If the actual tuple values in the
window are 6, 7, and 9, then openval(CountDim)
returns 5
for that window, not 6
.
If the dimension name is omitted and the aggregate has only one dimension, then openval()
uses that dimension. If the dimension name is omitted and the aggregate has more than one dimension, the result is a typecheck
error.