IMPUTE: Replacing Missing Values With Aggregated Values

How to:

IMPUTE calculates a value to replace missing numeric data on report output, within a partition.

In place of eliminating data records with missing values from analysis, IMPUTE enables you to substitute a variety of estimates for the missing values, including the mean, the median, the mode, or a numeric constant, all calculated within the data partition specified by the reset key. This function is designed to be used with detail level reports (PRINT or LIST commands), and with calculated values (fields created with the COMPUTE command).

Syntax: How to Replace Missing Values With Aggregated Values

IMPUTE(field, reset_key, replacement)

where:

field

Is the name of the numeric input field that is defined with MISSING ON.

reset_key

Defines the partition for the calculation. Valid values are:

  • A sort field name.
  • PRESET, which uses the break defined by the SET PARTITION_ON command.
  • TABLE, which performs the calculation on the entire table.
replacement

Is a numeric constant or one of the following:

  • MEAN
  • MEDIAN
  • MODE

Example: Replacing Missing Values With Aggregated Values

To run this example, the FOCUS data source SALEMISS must be created. SALEMISS is the SALES data source with some missing values added in the RETURNS and DAMAGED fields. The following is the SALEMISS Master File, which should be added to the IBISAMP application.

FILENAME=KSALES, SUFFIX=FOC, REMARKS='Legacy Metadata Sample: sales',$
 
SEGNAME=STOR_SEG, SEGTYPE=S1,
   FIELDNAME=STORE_CODE,  ALIAS=SNO,  FORMAT=A3,   $
   FIELDNAME=CITY,        ALIAS=CTY,  FORMAT=A15,  $
   FIELDNAME=AREA,        ALIAS=LOC,  FORMAT=A1,   $
 
SEGNAME=DATE_SEG, PARENT=STOR_SEG, SEGTYPE=SH1,
   FIELDNAME=DATE,        ALIAS=DTE,  FORMAT=A4MD, $
 
SEGNAME=PRODUCT, PARENT=DATE_SEG, SEGTYPE=S1,
   FIELDNAME=PROD_CODE,     ALIAS=PCODE,   FORMAT=A3,    FIELDTYPE=I, $
   FIELDNAME=UNIT_SOLD,     ALIAS=SOLD,    FORMAT=I5,    $
   FIELDNAME=RETAIL_PRICE,  ALIAS=RP,      FORMAT=D5.2M, $
   FIELDNAME=DELIVER_AMT,   ALIAS=SHIP,    FORMAT=I5,    $
   FIELDNAME=OPENING_AMT,   ALIAS=INV,     FORMAT=I5,    $
   FIELDNAME=RETURNS,       ALIAS=RTN,     FORMAT=I3,    MISSING=ON, $
   FIELDNAME=DAMAGED,       ALIAS=BAD,     FORMAT=I3,    MISSING=ON, $

The following procedure creates the SALEMISS data source and then adds the missing values to the RETURNS and DAMAGED fields:

CREATE FILE ibisamp/SALEMISS
MODIFY FILE ibisamp/SALEMISS
FIXFORM STORE_CODE/3 CITY/15 AREA/1 DATE/4 PROD_CODE/3
FIXFORM UNIT_SOLD/5 RETAIL_PRICE/5 DELIVER_AMT/5
FIXFORM OPENING_AMT/5 RETURNS/3 DAMAGED/3
MATCH STORE_CODE
ON NOMATCH INCLUDE
ON MATCH CONTINUE
MATCH DATE
ON NOMATCH INCLUDE
ON MATCH CONTINUE
MATCH PROD_CODE
ON NOMATCH INCLUDE
ON MATCH REJECT
DATA
14BSTAMFORD       S1212B10   60  .95   80   65 10  6
14BSTAMFORD       S1212B12   40 1.29   20   50  3  3
14BSTAMFORD       S1212B17   29 1.89   30   30  2  1
14BSTAMFORD       S1212C13   25 1.99   30   40  3  0
14BSTAMFORD       S1212C7    45 2.39   50   49  5  4
14BSTAMFORD       S1212D12   27 2.19   40   35  0  0
14BSTAMFORD       S1212E2    80  .99  100  100  9  4
14BSTAMFORD       S1212E3    70 1.09   80   90  8  9
14ZNEW YORK       U1017B10   30  .85   30   10  2  3
14ZNEW YORK       U1017B17   20 1.89   40   25  2  1
14ZNEW YORK       U1017B20   15 1.99   30    5  0  1
14ZNEW YORK       U1017C17   12 2.09   10   15  0  0
14ZNEW YORK       U1017D12   20 2.09   30   10  3  2
14ZNEW YORK       U1017E1    30  .89   25   45  4  7
14ZNEW YORK       U1017E3    35 1.09   25   45  4  2
77FUNIONDALE      R1018B20   25 2.09   40   25  1  1
77FUNIONDALE      R1018C7    40 2.49   40   40  0  0
K1 NEWARK         U1019B12   29 1.49   30   30  1  0
K1 NEWARK         U1018B10   13  .99   30   15  1  1
END
-RUN
MODIFY FILE ibisamp/SALEMISS
FIXFORM STORE_CODE/3 DATE/5 PROD_CODE/4
FIXFORM UNIT/3 RETAIL/5 DELIVER/3
FIXFORM OPEN/3 RETURNS/C3 DAMAGED/C3
MATCH STORE_CODE
ON NOMATCH INCLUDE
ON MATCH CONTINUE
MATCH DATE
ON NOMATCH INCLUDE
ON MATCH CONTINUE
MATCH PROD_CODE
ON NOMATCH INCLUDE
ON MATCH REJECT
DATA
14Z1017 C13 15 1.99 35 30    6  
14Z1017 C14 18 2.05 30 25 4     
14Z1017 E2  33 0.99 45 40
END
-RUN

The following request against the SALEMISS data source generates replacement values for the missing values in the RETURNS field, using only the values within the same store.

SET PARTITION_ON=FIRST
TABLE FILE SALEMISS
PRINT RETURNS
COMPUTE MEDIAN1 = IMPUTE(RETURNS, PRESET, MEDIAN);
COMPUTE MEAN1 = IMPUTE(RETURNS, PRESET, MEAN);
COMPUTE MODE1 = IMPUTE(RETURNS, PRESET, MODE);
BY STORE_CODE
ON TABLE SET PAGE NOPAGE
ON TABLE SET STYLE *
TYPE=REPORT, GRID=OFF,$
ENDSTYLE
END

The output is shown in the following image. The missing values occur in store 14Z, and the replacement values are calculated using only the RETURNS values from that store because PARTITION_ON is set to FIRST.

Changing the PARTITION_ON setting to TABLE produces the following output, in which the replacement values are calculated using all of the rows in the table.