Spark SQL Syntax and Expressions
The following table lists the Spark SQL syntax and expressions for Numbers:
Function | Description | Example |
---|---|---|
ABS()
|
Computes the absolute value. | ABS(-3.2) returns 3.2 |
CEIL()
|
Computes the ceiling of the given value. | CEIL(2.4) returns 3 |
EXP()
|
Computes the exponential of the given value. | EXP(1) returns 2.71828 |
FLOOR()
|
Computes the floor of the given value. | FLOOR(4.6) returns 4 |
LOG()
|
Computes the natural logarithm of the given value. | LOG(1) returns 0 |
RAND()
|
Generate a random column with independence and identically distributed (i.i.d.) samples from U[0.0, 1.0]. | RAND() returns [0.0,1.0] |
ROUND()
|
Returns the value of the column e rounded to 0 decimal places. | ROUND(2.4) returns 2 |
SQRT()
|
Computes the square root of the specified float value. | SQRT(4) returns 2 |
The following table lists the Spark SQL syntax and expressions for Strings:
Function | Description | Example |
---|---|---|
CONCAT()
|
Concatenates multiple input strings together into a single string | CONCAT('a', 'b') returns ab |
INSTR(str, String substring)
|
Locate the position of the first occurrence of a substring in the given string. | INSTR('entree', 'e') returns 1 |
LOCATE(String substr, str, int pos)
|
Locate the position of the first occurrence of a substring. | LOCATE('e','entree', 2) returns 5 |
LOWER(str)
|
Converts a string to lowercase. | LOWER('Alpine') returns alpine |
UPPER(str)
|
Converts a string to uppercase. | UPPER('Alpine') returns ALPINE |
REVERSE(str)
|
Reverses the string and returns it as a new string. | REVERSE('foobar') returns raboof |
SUBSTRING(str, int pos, int len)
|
Substring starts at pos and is of length len when str is String type or returns the slice of byte array that starts at pos in byte and is of length len when str is Binary type. |
SUBSTRING('Alpine', 2, 2) returns lp |
TRIM(str)
|
Trim the spaces from both ends for the specified string. | TRIM(' a b ') returns a b |
LENGTH(str)
|
Computes the length of a given string. | LENGTH('foobar') returns 6 |
TRANSLATE(src, String matchingString, String replaceString)
|
Translate any character in the src by a character in replaceString . |
TRANSLATE('Alpine', 'Alp', 'F') returns Fine |
REGEXP_EXTRACT(e, String exp, int groupIdx)
|
Extract a specific group matched by a Java regex, from the specified string. | REGEXP_EXTRACT('this:that', '(.*)\\:(.*)', 2) returns that |
REGEXP_REPLACE(e, String pattern, String replacement)
|
Replace all substrings of the specified string value that match regexp with rep. | REGEXP_REPLACE("foo bar", "oo|ar|\\s", "") returns fb |
The following table lists the Spark SQL syntax and expressions for Date-Timestamps:
Function | Description | Example |
---|---|---|
DAYOFMONTH()
|
Extracts the day of the month as an integer from a given date/timestamp/string. | DAYOFMONTH('2003-05-10' ) returns 10 |
HOUR()
|
Extracts the hours as an integer from a given date/timestamp/string. | HOUR('2003-10-12 15:02') returns 15 |
MINUTE()
|
Extracts the minutes as an integer from a given date/timestamp/string. | MINUTE('2003-10-04 05:12:23') returns 12 |
DATEDIFF( DATE_ADD(('datetime'), 2), 'datetime') | Returns the difference in date. | DATEDIFF( DATE_ADD(('2012-10-17'), 2), '2012-10-17') returns 2 |
CAST(TO_DATE(datetime_column) AS STRING)
|
Returns the input date as string in format 'yyyy-MM-dd'. | CAST(TO_DATE(2012-10-12) AS STRING) returns '2012-10-12' |
CAST(UNIX_TIMESTAMP('datetime_column') AS BIGINT)
|
Returns local Unix time in seconds. | CAST(UNIX_TIMESTAMP('2003-01-29') AS BIGINT) returns 1043827200 |
Note:
Spark SQL only supports the format ' If you are using strings with a different |
The following lists the Spark SQL syntax and expressions for Conditionals:
-
CASE WHEN LOWER(referring_url) LIKE 'www%.org%' THEN 'Non-Profit' ELSE 'Other' END
-
CASE WHEN LOWER(referring_url) RLIKE '^www.*\\.org$' THEN 'Non-Profit' ELSE 'Other' END
For more information, refer to the Spark SQL documentation.