PATTERN: Generating a Pattern From a String
The PATTERN function examines a source string and produces a pattern that indicates the sequence of numbers, uppercase letters, and lowercase letters in the source string. This function is useful for examining data to make sure that it follows a standard pattern.
In the output pattern:
- Any character from the input that represents a single-byte digit becomes the character 9.
- Any character that represents an uppercase letter becomes A, and any character that represents a lowercase letter becomes a. For European NLS mode (Western Europe, Central Europe), A and a are extended to apply to accented alphabets.
- For Japanese, double-byte characters and Hankaku-katakana become C (uppercase). Note that double-byte includes Hiragana, Katakana, Kanji, full-width alphabets, full-width numbers, and full-width symbols. This means that all double-byte letters such as Chinese and Korean are also represented as C.
- Special characters remain unchanged.
- An unprintable character becomes the character X.
Generate a Pattern From an Input String
PATTERN (length, source_string, output)
where:
Numeric
Is the length of source_string.
Alphanumeric
Is the source string enclosed in single quotation marks, or a field containing the source string.
Alphanumeric
Is the name of the field to contain the result or the format of the field enclosed in single quotation marks.
Producing a Pattern From Alphanumeric Data
The following 19 records are stored in a fixed-format sequential file (with LRECL 14) named TESTFILE:
212-736-6250 212 736 4433 123-45-6789 800-969-INFO 10121-2898 10121 2 Penn Plaza 917-339-6380 917-339-4350 (212) 736-6250 (212) 736-4433 212-736-6250 212-736-6250 212-736-6250 (212) 736 5533 (212) 736 5533 (212) 736 5533 10121 Æ 800-969-INFO
The Master File is:
FILENAME=TESTFILE, SUFFIX=FIX ,
SEGMENT=TESTFILE, SEGTYPE=S0, $
FIELDNAME=TESTFLD, USAGE=A14, ACTUAL=A14, $
The following request generates a pattern for each instance of TESTFLD and displays them by the pattern that was generated. It shows the count of each pattern and its percentage of the total count. The PRINT command shows which values of TESTFLD generated each pattern.
FILEDEF TESTFILE DISK testfile.ftmDEFINE FILE TESTFILE
PATTERN/A14 = PATTERN (14, TESTFLD, 'A14' ) ;
END
TABLE FILE TESTFILE
SUM CNT.PATTERN AS 'COUNT' PCT.CNT.PATTERN AS 'PERCENT'
BY PATTERN
PRINT TESTFLD
BY PATTERN
ON TABLE COLUMN-TOTAL
END
Note that the next to the last line produced a pattern from an input string that contained an unprintable character, so that character was changed to X. Otherwise, each numeric digit generated a 9 in the output string, each uppercase letter generated the character ‘A’, and each lowercase letter generated the character ‘a’. The output is:
PATTERN COUNT PERCENT TESTFLD
------- ----- ------- -------
(999) 999 9999 3 15.79 (212) 736 5533
(212) 736 5533
(212) 736 5533
(999) 999-9999 2 10.53 (212) 736-6250
(212) 736-4433
9 Aaaa Aaaaa 1 5.26 2 Penn Plaza
999 999 9999 1 5.26 212 736 4433
999-99-9999 1 5.26 123-45-6789
999-999-AAAA 2 10.53 800-969-INFO
800-969-INFO
999-999-9999 6 31.58 212-736-6250
917-339-6380
917-339-4350
212-736-6250
212-736-6250
212-736-6250
99999 1 5.26 10121
99999 X 1 5.26 10121 Æ
99999-9999 1 5.26 10121-2898
TOTAL 19 100.00
PATTERN generates a pattern for each instance of TESTFLD. The result is stored in a column with the format A14:
PATTERN (14, TESTFLD, 'A14' )
For 212-736-6250, the result is 999-999-9999.
For 800-969-INFO, the result is 1999-999-AAAA.