Predicate String Expressions
For predicates constructed on the fly the LPAR form described in the previous section might be appropriate, but the string format described here might be more convenient to use and easier understand in other cases. The string form is passed in using the string type lpars LPAR_STR_PREDICATE or block type LPAR_BLK_PREDICATE. These lpars can be used anywhere an LPAR_LST_PREDICATE lpar can be used. So the above example of a predicate to restrict searches to records dated after January 3, 2002 could also be expressed as:
lpar_t predicate;
predicate = lpar_create_str(LPAR_STR_PREDICATE,
"$\"Creation Date\" > DATE \"January 3, 2002\""
);
The predicate block form (LPAR_BLK_PREDICATE) is identical to the string form. It is provided as a means of handling very long expressions. An LPAR string is limited to about 1,000 characters in length, a block has no length restrictions. The null-terminated string form is easier to work with however so makes more sense for most cases. The block form can be used anywhere the string form is used even if it is not explicitly mentioned in the documentation. So anywhere in this documentation where LPAR_STR_PREDICATE is mentioned it should be understood that LPAR_BLK_PREDICATE can also be used.
A predicate string is similar to an SQL where clause expression. There are unary and binary operators and data elements. A data element is a field of a record or a string, block, numeric or Boolean constant. The data elements are:
|
Examples |
Description |
|
?TRUE?, ?FALSE? |
Boolean constant values are true and false enclosed in question marks. Letter case insensitive. |
|
123, 0777, 0x8FFF |
Integer constant values. They follow the standard "C" conventions for decimal, octal and hexadecimal integers. Values must be within the defined range for integers. |
|
123.45, 0.17e-10 |
Floating point values. They follow the standard "C" conventions for fixed and scientific notation values. Values must be within the defined range for "C" doubles. |
|
"string value" |
String constants are enclosed in double quotes. The basic XML/HTML entity encoding scheme is used to represent the double quote character itself (i.e. |
|
:"byte block" |
A byte block is specified by preceding a quoted string value with a colon character. The length of the block is computed automatically. As with strings all non-valid string characters must be encoded using the XML/HTML like encodings. |
|
#2 |
Table fields can be specified by numeric position. An integer value preceded with the pound (#) character is used to represent the record field at the indicated column position. Like array indexes in "C" these field numbers are zero based. That is #0 refers to the first field of the record. |
|
$"first name" |
Table fields can be specified by field name. A quoted string value (see string description above) preceded by the dollar sign ($) is used to specify the field name. Variable Attribute qualifiers and table name qualifiers are allowed. |
|
[:"block 1", :"block 2"] or [:] |
A comma separated list of blocks enclosed in square brackets is a block array. To specify a block array with zero entries, use an open bracket followed by a colon close bracket (:]) no space between the colon and close bracket. |
The table below shows the unary operators and the equivalent LPAR_INT_PREDOP operator described above. In many cases two or more synonyms are provided for the same operator. All alphabetic characters in operator names are not letter case sensitive for example, DATE, date, and Date are considered to be the same operator.
|
string value |
LPAR_INT_PREDOP |
|
int |
PRED_OP_TOINT |
|
dbl, double, float |
PRED_OP_TODBL |
|
date |
PRED_OP_TODATE |
|
date_time, datet |
PRED_OP_TODATET |
|
eudate, dateeu |
PRED_OP_TODATEEU |
|
eudate_time, dateeu_time, eudatet, dateeut |
PRED_OP_TODATEEUT |
|
blk, block |
PRED_OP_TOBLK |
|
- |
PRED_OP_MINUS |
|
+ |
none, this does nothing |
|
not |
PRED_OP_NOT |
|
split, tokenize |
PRED_OP_TOKENIZE |
|
abs |
PRED_OP_ABS |
|
geod, geodistance, geo_distance |
PRED_OP_FUNC_GEOD |
|
if |
PRED_OP_FUNC_IF |
|
to_score, toscore |
PRED_OP_FUNC_TOSCORE |
The binary operators are listed below. As with the unary operators named operators are letter case insensitive.
|
string value |
LPAR_INT_PREDOP |
|
+ |
PRED_OP_PLUS |
|
- |
PRED_OP_MINUS |
|
* |
PRED_OP_TIMES |
|
/ |
PRED_OP_DIVIDEDBY |
|
** |
PRED_OP_TOTHE |
|
and |
PRED_OP_AND |
|
or |
PRED_OP_OR |
|
=, == |
PRED_OP_EQUALS |
|
~=, ~== |
PRED_OP_iEQUALS |
|
< |
PRED_OP_LESSTHAN |
|
~< |
PRED_OP_iLESSTHAN |
|
<= |
PRED_OP_LESSTHANOREQ |
|
~<= |
PRED_OP_iLESSTHANOREQ |
|
> |
PRED_OP_GREATERTHAN |
|
~> |
PRED_OP_iGREATERTHAN |
|
>= |
PRED_OP_GREATERTHANOREQ |
|
~>= |
PRED_OP_iGREATERTHANOREQ |
|
in |
PRED_OP_ISIN |
|
i_in |
PRED_OP_iISIN |
|
superset |
PRED_OP_SUPERSET |
|
subset |
PRED_OP_SUBSET |
|
split |
PRED_OP_TOKENIZE |
|
tokenize |
PRED_OP_TOKENIZE |
| startswith | PRED_OP_SW |
| i_startswith | PRED_OP_iSW |
| endswith | PRED_OP_EW |
| i_endswith | PRED_OP_iEW |
| like | PRED_OP_LIKE |
| i_like | PRED_OP_iLIKE |
| matches | PRED_OP_REGEX_MATCH |
| i_matches | PRED_OP_iREGEX_MATCH |
| isblank, is_blank | PRED_OP_ISBLANK |
All unary operators have higher precedence than binary operators. That is: BLOCK ABS $"count1" - $"count2" is equivalent to:
( BLOCK ( ABS $"count1" ) ) - $"count2"
probably not what was intended. Parenthesis can be used to alter the default precedence relations:
BLOCK ABS ( $"count1" - $"count2" )
The binary operators are left associative, e.g.
1 + 2 + 3
is implemented as:
( 1 + 2 ) + 3
They have the standard precedence relations with the following caveats:
| 1. | + has slightly higher precedence than - (minus), thus a - b + c is a - (b + c) not (a - b) + c |
| 2. | similarly * has slightly higher precedence than / (divide) |
| 3. | All comparison operators have the same precedence. |
The binary operators listed by precedence from highest to lowest are:
| 4. | ** |
| 5. | * |
| 6. | / |
| 7. | + |
| 8. | - |
| 9. | tokenize, split |
| 10. | =, ~=, ==, ~==, <, ~<, <=, ~<=, >, ~>, >=, ~>=, in, i_in, superset, subset |
| 11. | and |
| 12. | or |
There are no operators for constructing argument lists in the string format. This is because the string format provides a syntactic means of specifying argument lists directly. An argument list is specified as an open curly brace ({) followed by a comma separated list of expressions, followed by a close curly brace (}). For example, the following is how the geo-distance function can be expressed:
geod { $"latitude", $"longitude", 45.0, 75.0, "miles" }
As function predicates are unary operators, they bind to their argument list at the highest precedence. Thus the expression:
geod { $"latitude", $"longitude", 45.0, 75.0, "miles" } ** 2
is the distance squared, and not the argument list squared.
Examples:
| • | Include only records of males born on or after January 1st, 1980: |
( $"sex" ~= "m" OR $"sex" ~= "male" ) AND $"birth date" >= date "1/1/1980"
| • | Include only parts of category "tool" and an average price less than $10: |
$"category" ~= "tool" AND ( $"max price" + $"min price" ) / 2.0 < 10.0
| • | Include only those people whose previous and current weight differ by less than 5 lbs. |
ABS ( $"cur weight" - $"prev weight" ) < 5.0
| • | Return a score for records within 20 miles of 45.0 degrees latitude and 75.0 degrees longitude. |
toscore { geod { $"latitude", $"longitude", 45.0, 75.0, "miles" }, 20.0, 0.0 }