REGEXP_SUBSTR: Returning the First Match to a Pattern in a String

REGEXP_SUBSTR returns a string that contains the first match to a specified regular expression pattern within a source string. If there is no match within the source string, a null string is returned.

Returning the First Match to a Pattern in a String

REGEXP_SUBSTR(string, pattern)

where:

string

Alphanumeric

Is the input string to be searched.

pattern

Alphanumeric

Is a regular expression, enclosed in single quotation marks, constructed using literals and meta-characters. The following meta-characters are supported

  • . represents any single character
  • * represents zero or more occurrences
  • + represents one or more occurrences
  • ? represents zero or one occurrence
  • ^ represents beginning of line
  • $ represents the end of the line
  • [] represents any one character in the set listed within the brackets
  • [^] represents any one character not in the set listed within the brackets
  • | represents the Or operator
  • \ is the Escape Special Character
  • () contains a character sequence

Returning the First Match of a Pattern in a String

The following example uses the following Regular Expression symbols.

  • [A-Z], which matches any uppercase letter.
  • [a-z], which matches any lowercase letter.
  • $, which searches for a specified expression that occurs at the end of a string.

REGEXP_SUBSTR searches for a string with any uppercase letter followed by the characters 'umpty' at the end of the string 'Humpty Dumpty'.

REGEXP_SUBSTR('Humpty Dumpty', '[A-Z]umpty$')

The result is 'Dumpty'.

In the following request, REG1 contains the first instance of a string within the REGION field value that starts with an uppercase letter, followed by any number of lowercase letters, followed by the characters 'west'. REG2 contains the first instance of a string within the REGION field value that starts with an uppercase letter, followed by any number of lowercase letters, followed by the characters 'east'.

TABLE FILE GGSALES
SUM DOLLARS NOPRINT AND COMPUTE
REG1/A25 = REGEXP_SUBSTR(REGION, '[A-Z][a-z]*west');
REG2/A25 = REGEXP_SUBSTR(REGION, '[A-Z][a-z]*east');
BY REGION
ON TABLE SET PAGE NOLEAD
ON TABLE SET STYLE *
GRID=OFF,$
ENDSTYLE
END

The output is shown in the following image.

The following version of the request runs on z/OS, where the regular expression is generated as a variable, using the CHAR function to insert the meta-characters. Note that the asterisk meta-character (*) needs to be represented as CHAR(92).

-SET &REG1=CHAR(173) || 'A-Z' || CHAR(189) || CHAR(173) || 'a-z'
- || CHAR(189) || CHAR(92) || 'west' ; 
-SET &REG2=CHAR(173) || 'A-Z' || CHAR(189)  || CHAR(173) || 'a-z' 
- || CHAR(189) || '*east';    
TABLE FILE GGSALES 
SUM DOLLARS NOPRINT AND COMPUTE    
REG1/A25 = REGEXP_SUBSTR(REGION,'&REG1');  
REG2/A25 = REGEXP_SUBSTR(REGION,'&REG2');   
BY REGION   
ON TABLE SET PAGE NOLEAD 
ON TABLE SET STYLE *     
GRID=OFF,$  
ENDSTYLE  
END   

The output follows.

Region       REG1                       REG2     
------       ----                       ----     
Midwest      Midwest                             
Northeast                               Northeast
Southeast                               Southeast
West