Normalization (DB)

Performs normalization on the selected columns of the input data set. Normalization means adjusting values measured on different scales to a notionally common scale.

Information at a Glance

Category Transform
Data source type DB
Sends output to other operators Yes
Data processing tool n/a
Note: The Normalization (DB) operator is for database data only. For Hadoop data, use the Normalization (HD) operator.

Algorithm

You can accomplish normalization in various ways.

  • By specifying a user-defined minimum and maximum value.
  • By a z-transformation (for example, on mean 0 and variance 1).
  • By a transformation as proportion of the average or sum of the respective attribute.

Your selection translates into four possible types of normalization to select.

  • Z-Transformation.
  • Proportion Transformation.
  • Range Transformation.
  • Divide-By-Average Transformation.

See Method under Configuration for a definition of each type.

Input

A data set from the preceding operator.

Configuration

Parameter Description
Notes Any notes or helpful information about this operator's parameter settings. When you enter content in the Notes field, a yellow asterisk is displayed on the operator.
Method Normalization method to use.

Options:

  • Divide-By-Average Transformation: calculate by sample's average.
  • Proportional Transformation: calculate by sample's sum.
  • Z-Transformation: calculate by sample's mean and variance.
  • Range Transformation: calculate by sample's Min and Max value.
Range Minimum Specify the minimum value in Range transformation.
Range Maximum Specify the maximum value in Range transformation.
Columns Click Column Names to open the dialog box for selecting the available numerical columns for the columns to normalize.
Output Type
  • TABLE outputs a database table. Specifying TABLE enables Storage Parameters.
  • VIEW outputs a database view.
Output Schema The schema for the output table or view.
Output Table The table path and name where the results are output. By default, this is a unique table name based on your user ID, workflow ID, and operator.
Storage Parameters Advanced database settings for the operator output. Available only for TABLE output.

See Storage Parameters Dialog Box for more information.

Drop If Exists Specifies whether to overwrite an existing table.
  • Yes - If a table with the name exists, it is dropped before storing the results.
  • No - If a table with the name exists, the results window shows an error message.

Output

Visual Output
The data rows of the output table or view displayed (up to 200 rows of the data).



Data Output
The data set of the normalized data.