Pairwise Deletion of Missing Data vs. Mean Substitution

Another common method to avoid losing data due to casewise deletion is the so-called mean substitution of missing data (replacing all missing data in a variable by the mean of that variable).

This way of handling missing data can be requested in many modules. You can also use this method to permanently remove the missing data from your data set via the Replace Missing Data option on the Data menu.

Mean substitution offers some advantages and some disadvantages as compared to pairwise deletion. Its main advantage is that it produces internally consistent sets of results (true correlation matrices).

These are the main disadvantages:
  • Mean substitution artificially decreases the variation of scores, and this decrease in individual variables is proportional to the number of missing data (for instance, the more missing data, the more perfectly average scores will be artificially added to the data set).

  • Because it substitutes missing data with artificially created average data points, mean substitution may considerably change the values of correlations.