Predicate String Expressions

For predicates constructed on the fly the LPAR form described in the previous section might be appropriate, but the string format described here might be more convenient to use and easier understand in other cases. The string form is passed in using the string type lpars LPAR_STR_PREDICATE or block type LPAR_BLK_PREDICATE. These lpars can be used anywhere an LPAR_LST_PREDICATE lpar can be used. So the above example of a predicate to restrict searches to records dated after January 3, 2002 could also be expressed as:

lpar_t predicate;
predicate = lpar_create_str(LPAR_STR_PREDICATE,
"$\"Creation Date\" > DATE \"January 3, 2002\""
);

The predicate block form (LPAR_BLK_PREDICATE) is identical to the string form. It is provided as a means of handling very long expressions. An LPAR string is limited to about 1,000 characters in length, a block has no length restrictions. The null-terminated string form is easier to work with however so makes more sense for most cases. The block form can be used anywhere the string form is used even if it is not explicitly mentioned in the documentation. So anywhere in this documentation where LPAR_STR_PREDICATE is mentioned it should be understood that LPAR_BLK_PREDICATE can also be used.

A predicate string is similar to an SQL where clause expression. There are unary and binary operators and data elements. A data element is a field of a record or a string, block, numeric or Boolean constant. The data elements are:

Examples

Description

?TRUE?, ?FALSE?

Boolean constant values are true and false enclosed in question marks. Letter case insensitive.

123, 0777, 0x8FFF

Integer constant values. They follow the standard "C" conventions for decimal, octal and hexadecimal integers. Values must be within the defined range for integers.

123.45, 0.17e-10

Floating point values. They follow the standard "C" conventions for fixed and scientific notation values. Values must be within the defined range for "C" doubles.

"string value"

String constants are enclosed in double quotes. The basic XML/HTML entity encoding scheme is used to represent the double quote character itself (i.e. ") and other special characters. The numeric conventions: &#ddd;, &#xhh; are recognized and the entity names: quot, amp, lt, gt and apos are recognized. No other entity names are recognized. Note that you must NOT insert an encoded NULL character into the string. The string is converted to a standard "C" NULL terminated string, inserting a NULL effectively terminates the string at that point.

:"byte block"

A byte block is specified by preceding a quoted string value with a colon character. The length of the block is computed automatically. As with strings all non-valid string characters must be encoded using the XML/HTML like encodings.

#2

Table fields can be specified by numeric position. An integer value preceded with the pound (#) character is used to represent the record field at the indicated column position. Like array indexes in "C" these field numbers are zero based. That is #0 refers to the first field of the record.

$"first name"

Table fields can be specified by field name. A quoted string value (see string description above) preceded by the dollar sign ($) is used to specify the field name. Variable Attribute qualifiers and table name qualifiers are allowed.

[:"block 1", :"block 2"] or [:]

A comma separated list of blocks enclosed in square brackets is a block array. To specify a block array with zero entries, use an open bracket followed by a colon close bracket (:]) no space between the colon and close bracket.

The table below shows the unary operators and the equivalent LPAR_INT_PREDOP operator described above. In many cases two or more synonyms are provided for the same operator. All alphabetic characters in operator names are not letter case sensitive for example, DATE, date, and Date are considered to be the same operator.

string value

LPAR_INT_PREDOP

int

PRED_OP_TOINT

dbl, double, float

PRED_OP_TODBL

date

PRED_OP_TODATE

date_time, datet

PRED_OP_TODATET

eudate, dateeu

PRED_OP_TODATEEU

eudate_time, dateeu_time, eudatet, dateeut

PRED_OP_TODATEEUT

blk, block

PRED_OP_TOBLK

-

PRED_OP_MINUS

+

none, this does nothing

not

PRED_OP_NOT

split, tokenize

PRED_OP_TOKENIZE

abs

PRED_OP_ABS

geod, geodistance, geo_distance

PRED_OP_FUNC_GEOD

if

PRED_OP_FUNC_IF

to_score, toscore

PRED_OP_FUNC_TOSCORE

The binary operators are listed below. As with the unary operators named operators are letter case insensitive.

string value

LPAR_INT_PREDOP

+

PRED_OP_PLUS

-

PRED_OP_MINUS

*

PRED_OP_TIMES

/

PRED_OP_DIVIDEDBY

**

PRED_OP_TOTHE

and

PRED_OP_AND

or

PRED_OP_OR

=, ==

PRED_OP_EQUALS

~=, ~==

PRED_OP_iEQUALS

<

PRED_OP_LESSTHAN

~<

PRED_OP_iLESSTHAN

<=

PRED_OP_LESSTHANOREQ

~<=

PRED_OP_iLESSTHANOREQ

>

PRED_OP_GREATERTHAN

~>

PRED_OP_iGREATERTHAN

>=

PRED_OP_GREATERTHANOREQ

~>=

PRED_OP_iGREATERTHANOREQ

in

PRED_OP_ISIN

i_in

PRED_OP_iISIN

superset

PRED_OP_SUPERSET

subset

PRED_OP_SUBSET

split

PRED_OP_TOKENIZE

tokenize

PRED_OP_TOKENIZE

startswith PRED_OP_SW
i_startswith PRED_OP_iSW
endswith PRED_OP_EW
i_endswith PRED_OP_iEW
like PRED_OP_LIKE
i_like PRED_OP_iLIKE
matches PRED_OP_REGEX_MATCH
i_matches PRED_OP_iREGEX_MATCH
isblank, is_blank PRED_OP_ISBLANK

All unary operators have higher precedence than binary operators. That is: BLOCK ABS $"count1" - $"count2" is equivalent to:

( BLOCK ( ABS $"count1" ) ) - $"count2"

probably not what was intended. Parenthesis can be used to alter the default precedence relations:

BLOCK ABS ( $"count1" - $"count2" )

The binary operators are left associative, e.g.

1 + 2 + 3

is implemented as:

( 1 + 2 ) + 3

They have the standard precedence relations with the following caveats:

1. + has slightly higher precedence than - (minus), thus a - b + c is a - (b + c) not (a - b) + c
2. similarly * has slightly higher precedence than / (divide)
3. All comparison operators have the same precedence.

The binary operators listed by precedence from highest to lowest are:

4. **
5. *
6. /
7. +
8. -
9. tokenize, split
10. =, ~=, ==, ~==, <, ~<, <=, ~<=, >, ~>, >=, ~>=, in, i_in, superset, subset
11. and
12. or

There are no operators for constructing argument lists in the string format. This is because the string format provides a syntactic means of specifying argument lists directly. An argument list is specified as an open curly brace ({) followed by a comma separated list of expressions, followed by a close curly brace (}). For example, the following is how the geo-distance function can be expressed:

 

geod { $"latitude", $"longitude", 45.0, 75.0, "miles" }

 

As function predicates are unary operators, they bind to their argument list at the highest precedence. Thus the expression:

 

geod { $"latitude", $"longitude", 45.0, 75.0, "miles" } ** 2

 

is the distance squared, and not the argument list squared.

Examples:

Include only records of males born on or after January 1st, 1980:
( $"sex" ~= "m" OR $"sex" ~= "male" ) AND $"birth date" >= date "1/1/1980"
Include only parts of category "tool" and an average price less than $10:
$"category" ~= "tool" AND ( $"max price" + $"min price" ) / 2.0 < 10.0
Include only those people whose previous and current weight differ by less than 5 lbs.
ABS ( $"cur weight" - $"prev weight" ) < 5.0
Return a score for records within 20 miles of 45.0 degrees latitude and 75.0 degrees longitude.
toscore { geod { $"latitude", $"longitude", 45.0, 75.0, "miles" }, 20.0, 0.0 }