PATTERN: Generating a Pattern From a String

The PATTERN function examines a source string and produces a pattern that indicates the sequence of numbers, uppercase letters, and lowercase letters in the source string. This function is useful for examining data to make sure that it follows a standard pattern.

In the output pattern:

  • Any character from the input that represents a single-byte digit becomes the character 9.
  • Any character that represents an uppercase letter becomes A, and any character that represents a lowercase letter becomes a. For European NLS mode (Western Europe, Central Europe), A and a are extended to apply to accented alphabets.
  • For Japanese, double-byte characters and Hankaku-katakana become C (uppercase). Note that double-byte includes Hiragana, Katakana, Kanji, full-width alphabets, full-width numbers, and full-width symbols. This means that all double-byte letters such as Chinese and Korean are also represented as C.
  • Special characters remain unchanged.
  • An unprintable character becomes the character X.

Generate a Pattern From an Input String

PATTERN (length, source_string,  output)

where:

length

Numeric

Is the length of source_string.

source_string

Alphanumeric

Is the source string enclosed in single quotation marks, or a field containing the source string.

output

Alphanumeric

Is the name of the field to contain the result or the format of the field enclosed in single quotation marks.

Producing a Pattern From Alphanumeric Data

The following 19 records are stored in a fixed-format sequential file (with LRECL 14) named TESTFILE:

212-736-6250
212 736 4433
123-45-6789
800-969-INFO
10121-2898
10121
2 Penn Plaza
917-339-6380
917-339-4350
(212) 736-6250
(212) 736-4433
212-736-6250
212-736-6250
212-736-6250
(212) 736 5533
(212) 736 5533
(212) 736 5533
10121 Æ
800-969-INFO

The Master File is:

FILENAME=TESTFILE, SUFFIX=FIX     ,            
  SEGMENT=TESTFILE, SEGTYPE=S0, $              
    FIELDNAME=TESTFLD, USAGE=A14, ACTUAL=A14, $

The following request generates a pattern for each instance of TESTFLD and displays them by the pattern that was generated. It shows the count of each pattern and its percentage of the total count. The PRINT command shows which values of TESTFLD generated each pattern.

FILEDEF TESTFILE DISK testfile.ftmDEFINE FILE TESTFILE
   PATTERN/A14 = PATTERN (14, TESTFLD, 'A14' ) ;
END
TABLE FILE TESTFILE
  SUM CNT.PATTERN AS 'COUNT' PCT.CNT.PATTERN AS 'PERCENT'
   BY PATTERN
 PRINT TESTFLD
   BY PATTERN
ON TABLE COLUMN-TOTAL
END

Note that the next to the last line produced a pattern from an input string that contained an unprintable character, so that character was changed to X. Otherwise, each numeric digit generated a 9 in the output string, each uppercase letter generated the character ‘A’, and each lowercase letter generated the character ‘a’. The output is:

PATTERN         COUNT  PERCENT  TESTFLD
-------         -----  -------  -------
(999) 999 9999      3    15.79  (212) 736 5533
                                (212) 736 5533
                                (212) 736 5533
(999) 999-9999      2    10.53  (212) 736-6250
                                (212) 736-4433
9 Aaaa Aaaaa        1     5.26  2 Penn Plaza
999 999 9999        1     5.26  212 736 4433
999-99-9999         1     5.26  123-45-6789
999-999-AAAA        2    10.53  800-969-INFO
                                800-969-INFO
999-999-9999        6    31.58  212-736-6250
                                917-339-6380
                                917-339-4350
                                212-736-6250
                                212-736-6250
                                212-736-6250
99999               1     5.26  10121
99999 X             1     5.26  10121 Æ
99999-9999          1     5.26  10121-2898
TOTAL              19   100.00

PATTERN generates a pattern for each instance of TESTFLD. The result is stored in a column with the format A14:

PATTERN (14, TESTFLD, 'A14' )

For 212-736-6250, the result is 999-999-9999.

For 800-969-INFO, the result is 1999-999-AAAA.