Workspace Node: Data Health Check Summary - Specifications - Redundancy Tab

In the Data Health Check Summary node dialog box, under the Specifications heading, select the Redundancy tab to access the following options.

Run redundancy check. Select this check box to perform the check for redundant variables as defined by the other options specified on this tab.

Continuous vs. continuous variables. Specify the correlation coefficient threshold for determining redundant continuous variables. Those variables with correlation coefficients at or above threshold will be identified as redundant. Following is the formula for correlation:

n = number of observations

= sample mean of X values

= sample mean of Y values

Continuous vs. categorical variables. Specify the coefficient of determination or R2 value threshold for determining redundancy among pairs of categorical and continuous variables. Following is the formula for R2:

n = number of observations

= sample mean

= predicted value of continuous variable when using categorical variable as a predictor variable

Categorical vs. categorical variables. Specify the Cramer’s V threshold for determining redundancy among pairs of categorical variables. Following is the formula for Cramer's V:

Chi2 = Pearson chi square statistic

N = Number of observations

NRows = Number of rows in crosstabulation

NCols = Number of columns in crosstabulation

Generate color maps. Select this check box to produce color maps for Correlation, Cramer's V, and ANOVA. This procedure may be time consuming.

Automatically remove redundant variables. Select this check box to automatically remove redundant variables from downstream documents. Note that this option has no effect if a downstream document is not created. A downstream is created by selecting the Display Data Diagnostic Report and Apply Data Cleaning check box on the Results tab.

Options. See Common Options.

OK. Click the OK button to accept all the specifications made in the dialog box and to close it.