Normalization by Trimmed Mean 


The trimmed mean for a variable is based on all values except a certain percentage of the lowest and highest values for that variable. This removes the effect of outliers during the normalization. If the trim value is set to 10% then the highest 5% of the values and the lowest 5% of the values are excluded from the calculated mean.

Assume that there are n rows with seven variables, A, B, C, D, E, F and G, in the data. We use variable E as an example in the calculations below. The remaining variables in the rows are normalized in the same way.

Without rescaling (Baseline variable = None)

The normalized value of ei for variable E in the ith row is calculated as:

norm_trimmed_without_eq.png

where

T = the set of rows left after trimming

p = the number of rows in T.

Rescaling by a baseline variable

If we select variable A as baseline variable, the normalized value of ei for variable E in the ith row is calculated as:

images/n_trimmed_with.gif

where

T = the set of rows left after trimming

p = the number of rows in T

aj = the value for variable A in the jth row.

See also:

Normalizing Columns