Summary Statistics (DB)
Provides useful summary information for the selected columns of the data set passed by the preceding operator.
Information at a Glance
Parameter |
Description |
---|---|
Category | Explore |
Data source type | DB |
Send output to other operators | No |
Data processing tool | n/a |
Input
A data set from the preceding operator.
Restrictions
The Summary Statistics operator does not work with generic JDBC data sets. See the Summary Statistics (DB).
Configuration
Parameter | Description |
---|---|
Notes | Notes or helpful information about this operator's parameter settings. When you enter content in the Notes field, a yellow asterisk appears on the operator. |
Columns | Select the numeric columns for which the summary statistics should be displayed.
|
Group By | Click Select Columns to open the dialog for selecting the available columns from the input dataset to group results by. |
Calculate the Number of Distinct Values (slower) |
Determines whether to calculate the number of distinct values for selected columns. Calculating distinct values can add significant processing time. Default value: Yes |
Number of most Common Values to Display |
Determines the maximum number of the most common values to output for each column. Only enabled if Calculate the Number of Distinct Values is enabled. |
Output Schema | The schema for the output table or view. |
Output Table | Specify the table path and name where the output of the results is generated. By default, this is a unique table name based on your user ID, workflow ID, and operator. |
Storage Parameters | Advanced database settings for the operator output. Available only for
TABLE output.
See Storage Parameters dialog for more information. |
Drop If Exists | Specifies whether to overwrite an existing table.
|
Output
A table that displays the analysis results of the selected fields. The following list shows the default table contents.
- Name
- Data type
- Count
- Unique value count
- Null value count
- Empty value count
- Zero value count
- Min value
- 25% (approx.) - Approximate 25% value for numerical columns.
- Median (approx.) - Approximate median value for numerical columns.
- 75% (approx.) - Approximate 75% value for numerical columns.
- Maximum value
- Standard deviation
- Average
- Positive value count
- Negative value count
- Most Common (Value) - The most common value for the column.
- Most Common (Percentage) - The percentage of the total which are the most common value.
- 2nd Most Common (Value) - The second most common value.
- 2nd Most Common (Value) - The percentage of the total which are the second most common value.