Normalizing columns
A number of normalization methods can be written as expressions, or used when transforming data. See the links at the end of this topic for a description of the theory behind each method.
In the expression examples below, the following values are used:
Columns: E and A, where E is the column to normalize and A is a baseline column.
Percentile value: P
Normalize by mean
[E]/Avg([E])
[E]*Avg([A])/Avg([E])
Normalize by trimmed mean
[E]/TrimmedMean([E], P)
[E]*TrimmedMean([A], P)/TrimmedMean([E], P)
Normalize by percentile
[E]/Percentile([E], P)
[E]*Percentile([A], P)/Percentile([E], P)
Scale between 0 and 1
If(Max([E])= Min([E]), 0.5,
([E]-Min([E]))/(Max([E])-Min([E])))
Subtract the mean
[E]-Avg([E])
Subtract the median
[E]-Median([E])
Normalization by signed ratio
If([E]>[A], [E]/[A], -[A]/[E])
Normalization by log ratio
Log10([E]/[A])
Normalization by log ratio in standard deviation units
Log10([E]/[A])/StdDev(Log10([E]/[A]))
Z-score calculation
([E]-Avg([E]))/StdDev([E])
Normalize by standard deviation
[E]/StdDev([E])
- Normalization by mean
Assume that there are n rows with seven variables (columns), A, B, C, D, E, F and G, in the data. We use variable E as an example in the calculations below. The remaining variables in the rows are normalized in the same way. - Normalization by trimmed mean
The trimmed mean for a variable is based on all values except a certain percentage of the lowest and highest values for that variable. This removes the effect of outliers during the normalization. If the trim value is set to 10% then the highest 5% of the values and the lowest 5% of the values are excluded from the calculated mean. - Normalization by percentile
Assume that there are n rows with seven variables, A, B, C, D, E, F and G, in the data. We use variable E as an example in the calculations below. The remaining variables in the rows are normalized in the same way. - Normalization by scaling between 0 and 1
Assume that there are n rows with seven variables, A, B, C, D, E, F and G, in the data. We use variable E as an example in the calculations below. The remaining variables in the rows are normalized in the same way. - Normalization by subtracting the mean
Assume that there are n rows with seven variables, A, B, C, D, E, F and G, in the data. We use variable E as an example in the calculations below. The remaining variables in the rows are normalized in the same way. - Normalization by subtracting the median
Assume that there are n rows with seven variables, A, B, C, D, E, F and G, in the data. We use variable E as an example in the calculations below. The remaining variables in the rows are normalized in the same way. - Normalization by signed ratio
Assume that there are n rows with seven variables, A, B, C, D, E, F and G, in the data. We use variable E as an example in the calculations below. All target variables are normalized in the same way. - Normalization by log ratio
Assume that there are n rows with seven variables, A, B, C, D, E, F and G, in the data. We use variable E as an example in the calculations below. All target variables are normalized in the same way. - Normalization by log ratio in standard deviation units
Assume that there are n rows with k variables, A, B, C, D, E, F and G, in the data. We use variable E as an example in the calculations below. All target variables are normalized in the same way. - Normalization by Z-score
Assume that there are five rows with the IDs A, B, C, D and E, each row containing n different variables (columns). We use record E as an example in the calculations below. The remaining rows are normalized in the same way. - Normalization by standard deviation
Assume that there are five rows with the IDs A, B, C, D and E, each row containing n different variables (columns). We use record E as an example in the calculations below. The remaining rows are normalized in the same way.