Regular Expression Functions

In this section:

Using Regular Expressions on z/OS
REGEX: Matching a String to a Regular Expression
REGEXP_COUNT: Counting the Number of Matches to a Pattern in a String
REGEXP_INSTR: Returning the First Position of a Pattern in a String
REGEXP_REPLACE: Replacing All Matches to a Pattern in a String
REGEXP_SUBSTR: Returning the First Match to a Pattern in a String

A regular expression is a sequence of special meta-characters and literal characters that you can combine to form a search pattern.

Note: You can search online for information about the symbols used to create a regular expression pattern. For example, Wikipedia has a good introduction at:

https://en.wikipedia.org/wiki/Regular_expression

The following list summarizes common meta-characters used in regular expressions.

. represents any single character
* represents zero or more occurrences
+ represents one or more occurrences
? represents zero or one occurrence
^ represents beginning of line
$ represents end of line
[] represents any one character in the set listed within the brackets
[^] represents any one character not in the set listed within the brackets
| represents the Or operator
\ is the Escape Special Character
() contains a character sequence

For example, the regular expression '^Ste(v|ph)en$' matches values starting with Ste followed by either ph or v, and ending with en.

Using Regular Expressions on z/OS

On z/OS, depending on the code page you are using, some of the meta-characters used to create a regular expression may not be interpreted correctly when inserted directly from a Windows keyboard.

If you are using the Unicode code page 65002, the meta-characters will be interpreted correctly. In this environment, you need to be sure the files you are referencing, such as FOCUS data sources, have been built using this code page.

If you are not using a Unicode code page, you can use the CHAR function to return the correct meta-characters, based on the decimal code for the EBCDIC character. For example, to insert:

The circumflex (^) meta-character, use CHAR(95).
The left bracket ([) meta-character, use CHAR(173).
The right bracket (]) meta-character, use CHAR(189).
The left brace ({) meta-character, use CHAR(192).
The right brace (}) meta-character, use CHAR(208).

Create a Dialogue Manager variable that contains the pattern. To insert the meta-characters, use the CHAR function, and then use that variable as the argument in the regular expression function. For example, to generate the regular expression '[AEIOUaeiou]', which matches all uppercase and lowercase vowels, issue a -SET command similar to the following, which creates a variable named &VCWSTRING:

-SET &VCWSTRING=CHAR(173) || 'AEIOUaeiou' || CHAR(189);

Then use the &VCWSTRING variable as the regular expression argument in the function call. For example:

VowelCnt/I5=REGEXP_COUNT(PRODUCT, '&VCWSTRING');

REGEX: Matching a String to a Regular Expression

How to:

Match a String to a Regular Expression

The REGEX function matches a string to a regular expression and returns true (1) if it matches and false (0) if it does not match.

A regular expression is a sequence of special characters and literal characters that you can combine to form a search pattern.

Many references for regular expressions exist on the web.

For a basic summary, see the section Summary of Regular Expressions in Chapter 2, Security, of the Server Administration manual.

Syntax: How to Match a String to a Regular Expression

REGEX(string, regular_expression)

where:

string

Alphanumeric

Is the character string to match.

regular_expression

Alphanumeric

Is a regular expression, enclosed in single quotation marks, constructed using literals and meta-characters. The following meta-characters are supported

. represents any single character
* represents zero or more occurrences
+ represents one or more occurrences
? represents zero or one occurrence
^ represents beginning of line
$ represents end of line
[] represents any one character in the set listed within the brackets
[^] represents any one character not in the set listed within the brackets
| represents the Or operator
\ is the Escape Special Character
() contains a character sequence

For example, the regular expression '^Ste(v|ph)en$' matches values starting with Ste followed by either ph or v, and ending with en.

Note: The output value is numeric.

Example: Matching a String Against a Regular Expression

The following request matches the FIRSTNAME field against the regular expression 'PATRIC[(I?)K]', which matches PATRICIA or PATRICK:

DEFINE FILE VIDEOTRK   
PNAME/I5=REGEX(FIRSTNAME,'PATRIC[(I?)K]');                   
END                                                                   
TABLE FILE VIDEOTRK  
PRINT FIRSTNAME PNAME
BY LASTNAME
WHERE LASTNAME GE 'M'
ON TABLE SET PAGE NOLEAD
ON TABLE SET STYLE *
GRID=OFF,$
ENDSTYLE
END

The output is shown in the following image.

The following version of the request runs on z/OS. The variable &REG1 contains the regular expression string with the circumflex character (^) inserted as CHAR(95), the left bracket character ([) inserted as CHAR(173), and the right bracket character (]) inserted as CHAR(189). The other meta-characters are interpreted correctly.

-SET &REG1 = CHAR(95) || 'PATRIC' || CHAR(173) || 
- '(I?)K' || CHAR(189);                                 
DEFINE FILE VIDEOTRK                                    
PNAME/I5 = REGEX(FIRSTNAME,'&REG1') ;                   
END                                         
TABLE FILE VIDEOTRK                                 
PRINT FIRSTNAME PNAME    
BY LASTNAME                  
WHERE LASTNAME GE 'M' 
ON TABLE SET PAGE NOLEAD
 END

The output follows.

  LASTNAME         FIRSTNAME   PNAME 
  --------         ---------   ----- 
  MCMAHON          JOHN            0 
  MONROE           CATHERINE       0 
                   PATRICK         1 
  NON-MEMBER                       0 
  O'BRIEN          DONALD          0 
  PARKER           GLENDA          0 
                   RICHARD         0 
  RATHER           MICHAEL         0 
  RIESLER          LESLIE          0 
  SPIVEY           TOM             0 
  STANDLER         MICHAEL         0 
  STEWART          MAUDE           0 
  WHITE            PATRICIA        1 
  WILLIAMS         KENNETH         0 
  WILSON           KELLY           0 
  WU               MARTHA          0

REGEXP_COUNT: Counting the Number of Matches to a Pattern in a String

How to:

Count the Number of Matches to a Pattern in a String

REGEXP_COUNT returns the integer count of matches to a specified regular expression pattern within a source string.

Syntax: How to Count the Number of Matches to a Pattern in a String

REGEXP_COUNT(string, pattern)

where:

string

Alphanumeric

Is the input string to be searched.

pattern

Alphanumeric

Is a regular expression, enclosed in single quotation marks, constructed using literals and meta-characters. The following meta-characters are supported

. represents any single character
* represents zero or more occurrences
+ represents one or more occurrences
? represents zero or one occurrence
^ represents beginning of line
$ represents end of line
[] represents any one character in the set listed within the brackets
[^] represents any one character not in the set listed within the brackets
| represents the Or operator
\ is the Escape Special Character
() contains a character sequence

Example: Counting the Number of Matches to a Pattern in a String

The following example uses the following Regular Expression symbols.

$, which searches for a specified expression that occurs at the end of a string.
^, which searches for a specified expression that occurs at the beginning of a string.
\s*, which matches any number of whitespace characters, such as blank characters.
[T,t], which matches the characters 'T' and 't'.

In the following request, REG1 is the number of occurrences of the expression 'iscotti', with any number of following whitespace characters, that occur the end of the PRODUCT field. REG2 is the number of occurrences of the characters 'T' and 't' in the PRODUCT field.

TABLE FILE GGSALES
SUM DOLLARS AND COMPUTE
REG1/I5 = REGEXP_COUNT(PRODUCT, 'iscotti\s*$');
REG2/I5 = REGEXP_COUNT(PRODUCT, '[T,t]');
BY PRODUCT
ON TABLE SET PAGE NOLEAD
ON TABLE SET STYLE *
GRID=OFF,$
ENDSTYLE
END

The output is shown in the following image.

Example: Using REGEXP_COUNT on Windows and z/OS

The following request uses REGEXP_COUNT to return the number of vowels and number of consonants in each product name, on Windows. VowelCnt is the count of vowels, and ConsonantCnt is the count of non-vowels.

DEFINE FILE GGSALES
VowelCnt/I5=REGEXP_COUNT(PRODUCT,'[AEIOUaeiou]');
ConsonantCnt/I5=REGEXP_COUNT(PRODUCT,'[^AEIOUaeiou]');
END
TABLE FILE GGSALES
SUM MAX.VowelCnt AS 'Vowels' 
    MAX.ConsonantCnt AS 'Consonants'
BY PRODUCT
END

Note:

Brackets are used to enclose a list of characters that will match the regular expression pattern.
When the circumflex character (^) prefaces the list of characters within the brackets, the regular expression matches any character not on the list.

The output is shown in the following image.

The following version of the request uses REGEXP_COUNT to return the number of vowels and number of consonants in each product name, on z/OS. The -SET commands create the regular expressions by using the CHAR function to insert the meta-characters into the expressions. VowelCnt is the count of vowels, and ConsonantCnt is the count of non-vowels.

-SET &VCWSTRING=CHAR(173) || 'AEIOUaeiou' || CHAR(189);               
-SET &CONSTRING=CHAR(173) || CHAR(95) || 'AEIOU aeiou' || CHAR(189);  
DEFINE FILE GGSALES                                                   
VowelCnt/I5=REGEXP_COUNT(PRODUCT, '&VCWSTRING');                      
ConsonantCnt/I5=REGEXP_COUNT(PRODUCT,'&CONSTRING');                   
END                                                                   
TABLE FILE GGSALES                                                    
SUM MAX.VowelCnt AS 'Vowels'                                          
    MAX.ConsonantCnt AS 'Consonants'                                  
BY PRODUCT   
ON TABLE SET PAGE NOLEAD    
END

The output follows.

  Product           Vowels  Consonants 
  -------           ------  ---------- 
  Biscotti               3           5 
  Capuccino              4           5 
  Coffee Grinder         5           8 
  Coffee Pot             4           5 
  Croissant              3           6 
  Espresso               3           5 
  Latte                  2           3 
  Mug                    1           2 
  Scone                  2           3 
  Thermos                2           5

REGEXP_INSTR: Returning the First Position of a Pattern in a String

How to:

Return the Position of a Pattern in a String

REGEXP_INSTR returns the integer position of the first match to a specified regular expression pattern within a source string. The first character position in a string is indicated by the value 1. If there is no match within the source string, the value 0 is returned.

Syntax: How to Return the Position of a Pattern in a String

REGEXP_INSTR(string, pattern)

where:

string

Alphanumeric

Is the input string to be searched.

pattern

Alphanumeric

Is a regular expression, enclosed in single quotation marks, constructed using literals and meta-characters. The following meta-characters are supported

. represents any single character
* represents zero or more occurrences
+ represents one or more occurrences
? represents zero or one occurrence
^ represents beginning of line
$ represents end of line
[] represents any one character in the set listed within the brackets
[^] represents any one character not in the set listed within the brackets
| represents the Or operator
\ is the Escape Special Character
() contains a character sequence

Example: Finding the Position of a Pattern in a String

The following example uses the following Regular Expression symbols.

$, which searches for a specified expression that occurs at the end of a string.
^, which searches for a specified expression that occurs at the beginning of a string.
\s*, which matches any number of whitespace characters, such as blank characters.
[B,C,S], which matches the uppercase letters B, C, and S.

In the following request, REG1 is the position of the expression 'iscotti', with any number of following whitespace characters, that occur the end of the PRODUCT field value. REG2 is the position of the characters 'B' ,C, or 'S' that occur at the beginning of the PRODUCT field value.

TABLE FILE GGSALES
SUM DOLLARS AND COMPUTE
REG1/I5 = REGEXP_INSTR(PRODUCT, 'iscotti\s*$');
REG2/I5 = REGEXP_INSTR(PRODUCT, '^[B,C,S]');
BY PRODUCT
ON TABLE SET PAGE NOLEAD
ON TABLE SET STYLE *
GRID=OFF,$
ENDSTYLE
END

The output is shown in the following image.

The following version of the request runs on z/OS. The first regular expression can be input directly because the characters used are interpreted correctly. For the second regular expression, a variable is created to contain the pattern. This variable is then used in the function call.

-SET &REG2STR=CHAR(95) || CHAR(173) || 'B,C,S' || CHAR(189); 
TABLE FILE GGSALES                                           
SUM DOLLARS AND COMPUTE                                      
REG1/I5 = REGEXP_INSTR(PRODUCT, 'iscotti\s*$');              
REG2/I5 = REGEXP_INSTR(PRODUCT, '&REG2STR');                 
BY PRODUCT                                                   
ON TABLE SET PAGE NOLEAD                                     
ON TABLE SET STYLE *                                         
GRID=OFF,$                                                   
ENDSTYLE                                                     
END

The output follows.

  Product           Dollar Sales   REG1   REG2  
  -------           ------------   ----   ----  
  Biscotti               5263317      2      1  
  Capuccino              2381590      0      1  
  Coffee Grinder         2337567      0      1  
  Coffee Pot             2449585      0      1  
  Croissant              7749902      0      1  
  Espresso               3906243      0      0  
  Latte                 10943622      0      0  
  Mug                    4522521      0      0  
  Scone                  4216114      0      1  
  Thermos                2385829      0      0

REGEXP_REPLACE: Replacing All Matches to a Pattern in a String

How to:

Replace Matches to a Pattern in a String

REGEXP_REPLACE returns a string generated by replacing all matches to a regular expression pattern in the source string with the given replacement string. The replacement string can be a null string.

Syntax: How to Replace Matches to a Pattern in a String

REGEXP_REPLACE(string, pattern, replacement)

where:

string

Alphanumeric

Is the input string to be searched.

pattern

Alphanumeric

Is a regular expression, enclosed in single quotation marks, constructed using literals and meta-characters. The following meta-characters are supported

. represents any single character
* represents zero or more occurrences
+ represents one or more occurrences
? represents zero or one occurrence
^ represents beginning of line
$ represents end of line
[] represents any one character in the set listed within the brackets
[^] represents any one character not in the set listed within the brackets
| represents the Or operator
\ is the Escape Special Character
() contains a character sequence

replacement

Alphanumeric

Is the replacement string.

Example: Replacing Matches to a Pattern in a String

The following example uses the following Regular Expression symbol.

^, which searches for a specified expression that occurs at the beginning of a string.

In the following request REG1 replaces the string 'North' at the beginning of the REGION field value with the string 'South', and REG2 replaces the string 'Mid' at the beginning of the REGION field value with a null string.

TABLE FILE GGSALES
SUM DOLLARS NOPRINT AND COMPUTE
REG1/A25 = REGEXP_REPLACE(REGION, '^North', 'South');
REG2/A25 = REGEXP_REPLACE(REGION, '^Mid', '');
BY REGION
ON TABLE SET PAGE NOLEAD
ON TABLE SET STYLE *
GRID=OFF,$
ENDSTYLE
END

The output is shown in the following image.

The following version of the request runs on z/OS. The regular expression string is created in a variable using a -SET command. The circumflex meta-character (^) is inserted as CHAR(95).

-SET &REGSTRING1= CHAR(95) || 'North' ;                    
-SET &REGSTRING2= CHAR(95) || 'Mid' ;                      
TABLE FILE GGSALES                                         
SUM DOLLARS NOPRINT AND COMPUTE                            
REG1/A25 = REGEXP_REPLACE(REGION, '&REGSTRING1', 'South'); 
REG2/A25 = REGEXP_REPLACE(REGION, '&REGSTRING2', '');      
BY REGION                                                  
ON TABLE SET PAGE NOLEAD                                   
ON TABLE SET STYLE *                                       
GRID=OFF,$                                                 
ENDSTYLE                                                   
END

The output follows.

  Region       REG1                       REG2      
  ------       ----                       ----      
  Midwest      Midwest                    west      
  Northeast    Southeast                  Northeast 
  Southeast    Southeast                  Southeast 
  West         West                       West

REGEXP_SUBSTR: Returning the First Match to a Pattern in a String

How to:

Returning the First Match to a Pattern in a String

REGEXP_SUBSTR returns a string that contains the first match to a specified regular expression pattern within a source string. If there is no match within the source string, a null string is returned.

Syntax: How to Returning the First Match to a Pattern in a String

REGEXP_SUBSTR(string, pattern)

where:

string

Alphanumeric

Is the input string to be searched.

pattern

Alphanumeric

Is a regular expression, enclosed in single quotation marks, constructed using literals and meta-characters. The following meta-characters are supported

. represents any single character
* represents zero or more occurrences
+ represents one or more occurrences
? represents zero or one occurrence
^ represents beginning of line
$ represents end of line
[] represents any one character in the set listed within the brackets
[^] represents any one character not in the set listed within the brackets
| represents the Or operator
\ is the Escape Special Character
() contains a character sequence

Example: Returning the First Match of a Pattern in a String

The following example uses the following Regular Expression symbols.

[A-Z], which matches any uppercase letter.
[a-z], which matches any lowercase letter.

In the following request, REG1 contains the first instance of a string within the REGION field value that starts with an uppercase letter, followed by any number of lowercase letters, followed by the characters 'west'. REG2 contains the first instance of a string within the REGION field value that starts with an uppercase letter, followed by any number of lowercase letters, followed by the characters 'east'.

TABLE FILE GGSALES
SUM DOLLARS NOPRINT AND COMPUTE
REG1/A25 = REGEXP_SUBSTR(REGION, '[A-Z][a-z]*west');
REG2/A25 = REGEXP_SUBSTR(REGION, '[A-Z][a-z]*east');
BY REGION
ON TABLE SET PAGE NOLEAD
ON TABLE SET STYLE *
GRID=OFF,$
ENDSTYLE
END

The output is shown in the following image.

The following version of the request runs on z/OS, where the regular expression is generated as a variable, using the CHAR function to insert the meta-characters. Note that the asterisk meta-character (*) needs to be represented as CHAR(92).

-SET &REG1=CHAR(173) || 'A-Z' || CHAR(189) || CHAR(173) || 'a-z'
- || CHAR(189) || CHAR(92) || 'west' ; 
-SET &REG2=CHAR(173) || 'A-Z' || CHAR(189)  || CHAR(173) || 'a-z' 
- || CHAR(189) || '*east';    
TABLE FILE GGSALES 
SUM DOLLARS NOPRINT AND COMPUTE    
REG1/A25 = REGEXP_SUBSTR(REGION,'&REG1');  
REG2/A25 = REGEXP_SUBSTR(REGION,'&REG2');   
BY REGION   
ON TABLE SET PAGE NOLEAD 
ON TABLE SET STYLE *     
GRID=OFF,$  
ENDSTYLE  
END

The output follows.

Region       REG1                       REG2     
------       ----                       ----     
Midwest      Midwest                             
Northeast                               Northeast
Southeast                               Southeast
West