DIFFERENCE: Measuring the Phonetic Similarity Between Character Strings

How to:

DIFFERENCE returns an integer value measuring the difference between the SOUNDEX or METAPHONE values of two character expressions.

Syntax: How to Measure the Phonetic Similarity Between Character String

DIFFERENCE(chrexp1, chrexp2)

where:

chrexp1, chrexp2

Alphanumeric

Are the character strings to be compared.

Zero (0) represents the least similarity. For SOUNDEX, 4 represents the most similarity, and for METAPHONE, 16 represents the most similarity.

The use of SOUNDEX or METAPHONE depends on the PHONETIC_ALGORITHM setting. METAPHONE is the default algorithm.

Example: Measuring the Phonetic Similarity Between Character Strings

The following request uses DIFFERENCE with the default phonetic algorithm (METAPHONE) to compare first names in the data source with the names JOHN and MARY.

TABLE FILE VIDEOTRK
PRINT FIRSTNAME 
COMPUTE
JOHN_DIFF/I5 = DIFFERENCE(FIRSTNAME,'JOHN') ;
MARY_DIFF/I5 = DIFFERENCE(FIRSTNAME, 'MARY');
BY LASTNAME NOPRINT
WHERE RECORDLIMIT EQ 30
ON TABLE SET PAGE NOLEAD
ON TABLE SET STYLE *
GRID=OFF,$
ENDSTYLE
END

The output is shown in the following image. Note that the names JOANN and JOHN have the highest scores for matching with JOHN, and that MARCIA, MICHAEL, and MARTHA have the highest scores for matching with MARY.