Casewise vs. Pairwise Deletion of Missing Data

The default way of deleting missing data while calculating a correlation matrix is to exclude all cases that have missing data in at least one of the selected variables by casewise deletion of missing data.

This method results in a true correlation matrix, where all correlations are obtained from the same set of observations. However, if missing data are randomly distributed across cases, you could easily end up with no valid cases in the data set, because each of them will have at least one missing data in some variable.

The most common solution used in such instances is to use so-called pairwise deletion of missing data in correlation matrices, where a correlation between each pair of variables is calculated from all cases that have valid data on those two variables. In many instances this method works well, especially when the total percentage of missing data is low (for instance, 10%), and they are relatively randomly distributed between cases and variables. However, it may sometimes lead to serious problems.

For example, a systematic bias may result from a hidden systematic distribution of missing data, causing different correlation coefficients in the same correlation matrix to be based on different subsets of subjects. In addition to the possibly biased conclusions that you could derive from such pairwise calculated correlation matrices, real problems may occur when you subject such matrices to another analysis (such as multiple regression, factor analysis, or cluster analysis) that expects a true correlation matrix, with a certain level of consistency and transitivity between different coefficients. Such a correlation matrix may turn out to be not a true correlation matrix, and the other program will either be unable to process it, or will give erroneous results.

In Statistica you can either save a matrix in Basic Statistics and Tables and access it with another program or calculate a matrix casewise or pairwise in the respective program. Thus, if you are using the pairwise method of deleting the missing data, be sure to examine the distribution of missing data across the cells of the matrix for possible systematic patterns.