How to: |
IMPUTE calculates a value to replace missing numeric data on report output, within a partition.
In place of eliminating data records with missing values from analysis, IMPUTE enables you to substitute a variety of estimates for the missing values, including the mean, the median, the mode, or a numeric constant, all calculated within the data partition specified by the reset key. This function is designed to be used with detail level reports (PRINT or LIST commands), and with calculated values (fields created with the COMPUTE command).
IMPUTE(field, reset_key, replacement)
where:
Is the name of the numeric input field that is defined with MISSING ON.
Defines the partition for the calculation. Valid values are:
Is a numeric constant or one of the following:
To run this example, the FOCUS data source SALEMISS must be created. SALEMISS is the SALES data source with some missing values added in the RETURNS and DAMAGED fields. The following is the SALEMISS Master File, which should be added to the IBISAMP application.
FILENAME=KSALES, SUFFIX=FOC, REMARKS='Legacy Metadata Sample: sales',$ SEGNAME=STOR_SEG, SEGTYPE=S1, FIELDNAME=STORE_CODE, ALIAS=SNO, FORMAT=A3, $ FIELDNAME=CITY, ALIAS=CTY, FORMAT=A15, $ FIELDNAME=AREA, ALIAS=LOC, FORMAT=A1, $ SEGNAME=DATE_SEG, PARENT=STOR_SEG, SEGTYPE=SH1, FIELDNAME=DATE, ALIAS=DTE, FORMAT=A4MD, $ SEGNAME=PRODUCT, PARENT=DATE_SEG, SEGTYPE=S1, FIELDNAME=PROD_CODE, ALIAS=PCODE, FORMAT=A3, FIELDTYPE=I, $ FIELDNAME=UNIT_SOLD, ALIAS=SOLD, FORMAT=I5, $ FIELDNAME=RETAIL_PRICE, ALIAS=RP, FORMAT=D5.2M, $ FIELDNAME=DELIVER_AMT, ALIAS=SHIP, FORMAT=I5, $ FIELDNAME=OPENING_AMT, ALIAS=INV, FORMAT=I5, $ FIELDNAME=RETURNS, ALIAS=RTN, FORMAT=I3, MISSING=ON, $ FIELDNAME=DAMAGED, ALIAS=BAD, FORMAT=I3, MISSING=ON, $
The following procedure creates the SALEMISS data source and then adds the missing values to the RETURNS and DAMAGED fields:
CREATE FILE ibisamp/SALEMISS MODIFY FILE ibisamp/SALEMISS FIXFORM STORE_CODE/3 CITY/15 AREA/1 DATE/4 PROD_CODE/3 FIXFORM UNIT_SOLD/5 RETAIL_PRICE/5 DELIVER_AMT/5 FIXFORM OPENING_AMT/5 RETURNS/3 DAMAGED/3 MATCH STORE_CODE ON NOMATCH INCLUDE ON MATCH CONTINUE MATCH DATE ON NOMATCH INCLUDE ON MATCH CONTINUE MATCH PROD_CODE ON NOMATCH INCLUDE ON MATCH REJECT DATA 14BSTAMFORD S1212B10 60 .95 80 65 10 6 14BSTAMFORD S1212B12 40 1.29 20 50 3 3 14BSTAMFORD S1212B17 29 1.89 30 30 2 1 14BSTAMFORD S1212C13 25 1.99 30 40 3 0 14BSTAMFORD S1212C7 45 2.39 50 49 5 4 14BSTAMFORD S1212D12 27 2.19 40 35 0 0 14BSTAMFORD S1212E2 80 .99 100 100 9 4 14BSTAMFORD S1212E3 70 1.09 80 90 8 9 14ZNEW YORK U1017B10 30 .85 30 10 2 3 14ZNEW YORK U1017B17 20 1.89 40 25 2 1 14ZNEW YORK U1017B20 15 1.99 30 5 0 1 14ZNEW YORK U1017C17 12 2.09 10 15 0 0 14ZNEW YORK U1017D12 20 2.09 30 10 3 2 14ZNEW YORK U1017E1 30 .89 25 45 4 7 14ZNEW YORK U1017E3 35 1.09 25 45 4 2 77FUNIONDALE R1018B20 25 2.09 40 25 1 1 77FUNIONDALE R1018C7 40 2.49 40 40 0 0 K1 NEWARK U1019B12 29 1.49 30 30 1 0 K1 NEWARK U1018B10 13 .99 30 15 1 1 END -RUN
MODIFY FILE ibisamp/SALEMISS FIXFORM STORE_CODE/3 DATE/5 PROD_CODE/4 FIXFORM UNIT/3 RETAIL/5 DELIVER/3 FIXFORM OPEN/3 RETURNS/C3 DAMAGED/C3 MATCH STORE_CODE ON NOMATCH INCLUDE ON MATCH CONTINUE MATCH DATE ON NOMATCH INCLUDE ON MATCH CONTINUE MATCH PROD_CODE ON NOMATCH INCLUDE ON MATCH REJECT DATA 14Z1017 C13 15 1.99 35 30 6 14Z1017 C14 18 2.05 30 25 4 14Z1017 E2 33 0.99 45 40 END -RUN
The following request against the SALEMISS data source generates replacement values for the missing values in the RETURNS field, using only the values within the same store.
SET PARTITION_ON=FIRST TABLE FILE SALEMISS PRINT RETURNS COMPUTE MEDIAN1 = IMPUTE(RETURNS, PRESET, MEDIAN); COMPUTE MEAN1 = IMPUTE(RETURNS, PRESET, MEAN); COMPUTE MODE1 = IMPUTE(RETURNS, PRESET, MODE); BY STORE_CODE ON TABLE SET PAGE NOPAGE ON TABLE SET STYLE * TYPE=REPORT, GRID=OFF,$ ENDSTYLE END
The output is shown in the following image. The missing values occur in store 14Z, and the replacement values are calculated using only the RETURNS values from that store because PARTITION_ON is set to FIRST.
Changing the PARTITION_ON setting to TABLE produces the following output, in which the replacement values are calculated using all of the rows in the table.