OUTLIER: Identifying Outliers in Numeric Data

How to:

The 1.5 * IQR (Inner Quartile Range) rule is a common way to identify outliers in data. This rule defines an outlier as a value that is above or below 1.5 times the inner quartile range in the data. The inner quartile range is based on sorting the data values, dividing it into equal quarters, and calculating the range of values between the first quartile (the value one quarter of the way through the sorted data) and third quartile (the value three quarters of the way through the sorted data). The value that is 1.5 times below the inner quartile range is called the lower fence, and the value that is 1.5 times above the inner quartile range is called the upper fence.

OUTLIER is not supported in a DEFINE expression. It can be used in a COMPUTE expression or a WHERE, WHERE TOTAL, or WHERE_GROUPED phrase.

Given a numeric field as input, OUTLIER returns one of the following values for each value of the field, using the 1.5 * IQR rule:

Syntax: How to Identify Outliers in Numeric Data

OUTLIER(input_field)	

where:

input_field

Numeric

Is the numeric field to be analyzed.

Example: Identifying Outliers

The following request defines the SALES field to have different values depending on the store code, and uses OUTLIER to determine whether each field value is an outlier.

DEFINE FILE GGSALES
SALES/D12 = IF ((CATEGORY EQ 'Coffee') AND (STCD EQ 'R1019')) THEN 19000
  ELSE IF ((CATEGORY EQ 'Coffee') AND (STCD EQ 'R1020')) THEN 20000
  ELSE IF ((CATEGORY EQ 'Coffee') AND (STCD EQ 'R1040')) THEN 7000
  ELSE DOLLARS;
END 
TABLE FILE GGSALES
SUM SALES 
COMPUTE OUT1/I3 = OUTLIER(SALES);
BY CATEGORY
BY STCD
WHERE CATEGORY EQ 'Coffee'
ON TABLE SET PAGE NOLEAD
ON TABLE SET STYLE *
GRID=OFF,$
ENDSTYLE
END

The output is shown in the following image. Values above 2 million are above the upper fence, values below 1 million are below the lower fence, and other values are not outliers: