Correlation (DB)
Use to specify two or more numeric type attributes (columns) in a data set for relative analysis against each other by calculating the correlation between each pair of selected columns.
Information at a Glance
Note: The Correlation (DB) operator is for database data only. For Hadoop data, use the
Correlation (HD) operator.
Algorithm
The covariance between two variables (X and Y) is calculated as shown in the following formula:
where and are the mean values for X and Y, respectively.
The correlation is calculated by normalizing the covariance, as shown in the following formula:
Note: The
PCA operator is a multivariate modeling operator that also determines the covariance and correlation between variables. However, it goes a step further by applying a mapping of the variables into a reduced Principal Component space.
For information about correlation and covariance, see Correlation and Covariance.
Output
- Visual Output
- The correlation coefficient table. Each coefficient value provides a measure of how related the two variables are to each other. The value is 1 when the column is being compared against itself. A negative value means an opposite, negative relationship (that is, as one value goes up, the other goes down).
- Data Output
- None. This is a terminal operator.
Copyright © 2021. Cloud Software Group, Inc. All Rights Reserved.