REGEXP_SUBSTR: Returning the First Match to a Pattern in a String
REGEXP_SUBSTR returns a string that contains the first match to a specified regular expression pattern within a source string. If there is no match within the source string, a null string is returned.
Returning the First Match to a Pattern in a String
REGEXP_SUBSTR(string, pattern)
where:
Alphanumeric
Is the input string to be searched.
Alphanumeric
Is a regular expression, enclosed in single quotation marks, constructed using literals and meta-characters. The following meta-characters are supported
- . represents any single character
- * represents zero or more occurrences
- + represents one or more occurrences
- ? represents zero or one occurrence
- ^ represents beginning of line
- $ represents the end of the line
- [] represents any one character in the set listed within the brackets
- [^] represents any one character not in the set listed within the brackets
- | represents the Or operator
- \ is the Escape Special Character
- () contains a character sequence
Returning the First Match of a Pattern in a String
The following example uses the following Regular Expression symbols.
- [A-Z], which matches any uppercase letter.
- [a-z], which matches any lowercase letter.
- $, which searches for a specified expression that occurs at the end of a string.
REGEXP_SUBSTR searches for a string with any uppercase letter followed by the characters 'umpty' at the end of the string 'Humpty Dumpty'.
REGEXP_SUBSTR('Humpty Dumpty', '[A-Z]umpty$')
The result is 'Dumpty'.
In the following request, REG1 contains the first instance of a string within the REGION field value that starts with an uppercase letter, followed by any number of lowercase letters, followed by the characters 'west'. REG2 contains the first instance of a string within the REGION field value that starts with an uppercase letter, followed by any number of lowercase letters, followed by the characters 'east'.
TABLE FILE GGSALES SUM DOLLARS NOPRINT AND COMPUTE REG1/A25 = REGEXP_SUBSTR(REGION, '[A-Z][a-z]*west'); REG2/A25 = REGEXP_SUBSTR(REGION, '[A-Z][a-z]*east'); BY REGION ON TABLE SET PAGE NOLEAD ON TABLE SET STYLE * GRID=OFF,$ ENDSTYLE END
The output is shown in the following image.

The following version of the request runs on z/OS, where the regular expression is generated as a variable, using the CHAR function to insert the meta-characters. Note that the asterisk meta-character (*) needs to be represented as CHAR(92).
-SET ®1=CHAR(173) || 'A-Z' || CHAR(189) || CHAR(173) || 'a-z' - || CHAR(189) || CHAR(92) || 'west' ; -SET ®2=CHAR(173) || 'A-Z' || CHAR(189) || CHAR(173) || 'a-z' - || CHAR(189) || '*east'; TABLE FILE GGSALES SUM DOLLARS NOPRINT AND COMPUTE REG1/A25 = REGEXP_SUBSTR(REGION,'®1'); REG2/A25 = REGEXP_SUBSTR(REGION,'®2'); BY REGION ON TABLE SET PAGE NOLEAD ON TABLE SET STYLE * GRID=OFF,$ ENDSTYLE END
The output follows.
Region REG1 REG2 ------ ---- ---- Midwest Midwest Northeast Northeast Southeast Southeast West