Spark SQL Syntax and Expressions

The following table lists the Spark SQL syntax and expressions for Numbers:

Function Description Example
ABS() Computes the absolute value. ABS(-3.2) returns 3.2
CEIL() Computes the ceiling of the given value. CEIL(2.4) returns 3
EXP() Computes the exponential of the given value. EXP(1) returns 2.71828
FLOOR() Computes the floor of the given value. FLOOR(4.6) returns 4
LOG() Computes the natural logarithm of the given value. LOG(1) returns 0
RAND() Generate a random column with independence and identically distributed (i.i.d.) samples from U[0.0, 1.0]. RAND() returns [0.0,1.0]
ROUND() Returns the value of the column e rounded to 0 decimal places. ROUND(2.4) returns 2
SQRT() Computes the square root of the specified float value. SQRT(4) returns 2

The following table lists the Spark SQL syntax and expressions for Strings:

Function Description Example
CONCAT() Concatenates multiple input strings together into a single string CONCAT('a', 'b') returns ab
INSTR(str, String substring) Locate the position of the first occurrence of a substring in the given string. INSTR('entree', 'e') returns 1
LOCATE(String substr, str, int pos) Locate the position of the first occurrence of a substring. LOCATE('e','entree', 2) returns 5
LOWER(str) Converts a string to lowercase. LOWER('Alpine') returns alpine
UPPER(str) Converts a string to uppercase. UPPER('Alpine') returns ALPINE
REVERSE(str) Reverses the string and returns it as a new string. REVERSE('foobar') returns raboof
SUBSTRING(str, int pos, int len) Substring starts at pos and is of length len when str is String type or returns the slice of byte array that starts at pos in byte and is of length len when str is Binary type. SUBSTRING('Alpine', 2, 2) returns lp
TRIM(str) Trim the spaces from both ends for the specified string. TRIM(' a b ') returns a b
LENGTH(str) Computes the length of a given string. LENGTH('foobar') returns 6
TRANSLATE(src, String matchingString, String replaceString) Translate any character in the src by a character in replaceString. TRANSLATE('Alpine', 'Alp', 'F') returns Fine
REGEXP_EXTRACT(e, String exp, int groupIdx) Extract a specific group matched by a Java regex, from the specified string. REGEXP_EXTRACT('this:that', '(.*)\\:(.*)', 2) returns that
REGEXP_REPLACE(e, String pattern, String replacement) Replace all substrings of the specified string value that match regexp with rep. REGEXP_REPLACE("foo bar", "oo|ar|\\s", "") returns fb

The following table lists the Spark SQL syntax and expressions for Date-Timestamps:

Function Description Example
DAYOFMONTH() Extracts the day of the month as an integer from a given date/timestamp/string. DAYOFMONTH('2003-05-10') returns 10
HOUR() Extracts the hours as an integer from a given date/timestamp/string. HOUR('2003-10-12 15:02') returns 15
MINUTE() Extracts the minutes as an integer from a given date/timestamp/string. MINUTE('2003-10-04 05:12:23') returns 12
DATEDIFF( DATE_ADD(('datetime'), 2), 'datetime') Returns the difference in date. DATEDIFF( DATE_ADD(('2012-10-17'), 2), '2012-10-17') returns 2
CAST(TO_DATE(datetime_column) AS STRING) Returns the input date as string in format 'yyyy-MM-dd'. CAST(TO_DATE(2012-10-12) AS STRING) returns '2012-10-12'
CAST(UNIX_TIMESTAMP('datetime_column') AS BIGINT) Returns local Unix time in seconds. CAST(UNIX_TIMESTAMP('2003-01-29') AS BIGINT) returns 1043827200
Note:

Spark SQL only supports the format 'yyyy-MM-dd HH:mm:ss' for timestamps and 'yyyy-MM-dd' for dates. All input datetime columns are implicitly converted to this timestamp format when loading the data frame and converted to the initial (or new specified) format when writing it back to the TIBCO Data Virtualization.

If you are using strings with a different datetime format, then the format needs to be explicitly cast to the correct timestamp format: UNIX_TIMESTAMP('03/03/2012', 'MM/dd/yyyy').

The following lists the Spark SQL syntax and expressions for Conditionals:

  • CASE WHEN LOWER(referring_url) LIKE 'www%.org%' THEN 'Non-Profit' ELSE 'Other' END

  • CASE WHEN LOWER(referring_url) RLIKE '^www.*\\.org$' THEN 'Non-Profit' ELSE 'Other' END

For more information, refer to the Spark SQL documentation.