Statistical Functions

Tip: The DISTINCT keyword can be used to return a result using the unique values only. For example, Avg(DISTINCT[Column]) would return the average of the unique values rather than the average of all values in the specified column. UniqueCount([Column]) is the equivalent of Count(DISTINCT[Column]).

Function

Description

Avg(Arg1, ...)

Returns the average (arithmetic mean) of the arguments. The arguments and the result are of type real. If one argument is given, then the result is the average of all rows. If more than one argument is given, then the result is the average for each row. Null arguments are ignored and do not contribute to the average.

Examples:

Avg([Column])

Avg(2,-3,4) → 1

Avg(-1) → -1
Avg(1.5, -2, 3.5) → 1
Avg(1, null, 3) → 2
Avg(null) → (Empty)

ChiDist(Arg1)

Returns the (upper tail) chi-square p-value of the argument.

Example:

ChiDist(x, deg_freedom)

ChiDist(7.377759, 2) →0.025

ChiInv(Arg1)

Returns the (upper tail) chi-square quantile value of the argument.

Example:

ChiInv(p, deg_freedom)

ChiInv(0.025, 2) → 7.377759

Count(Arg1)

Calculates the number of non-empty values in the argument column, or, if no argument is specified, the total number of rows.

Example:

Count([Column])

CountBig(Arg1)

Calculates the number of non-empty values in the argument column, or, if no argument is specified, the total number of rows. This function returns a LongInteger.

Example:

CountBig([Column])

Covariance(Arg1, Arg2)

Calculates the covariance of two columns given as arguments.

Example:

Covariance([Column1], [Column2])

FDist(Arg1

Returns the upper tail F p-value of the argument.

Example:

FDist(x, deg_freedom1, deg_freedom2)

FDist(6.936728, 1, 10) → 0.025

FInv(Arg1)

Returns the upper tail F quantile value of the argument.

Example:

FInv(p, deg_freedom1, deg_freedom2)

FInv(0.025, 1, 10) → 6.936728

First(Arg1)

Returns the first valid value based on the physical order of the rows of data in the argument column.

Example:

First([Column])

GeometricMean()

Calculates the geometric mean value. If any input value is negative then the result will be "Empty". If any input value is equal to zero then the result will be zero.

Example:

GeometricMean([Sales])

IQR(Arg1)

Calculates the value difference Q3-Q1, or, the 75th percentile minus the 25th percentile. IQR is also referred to as the H-spread.

Example:

IQR([Column])

L95(Arg1)

Calculates the lower endpoint of the 95% confidence interval.

Note: The static t-value of 1.959964, used by this function, is adapted for large sample sizes (n >= 40). For smaller sample sizes, use the following expression instead:

Avg([Value])-TInv(0.025,Count()-1)*StdDev([Value])/Sqrt(Count())

Example:

L95([Column])

Lag(Arg1, Arg2)

Shifts the values in a column downward by a specified number of steps. The first argument is the column to shift. The second (optional) argument is the number of steps. Default is 1.

If a negative number of steps is used, then the values are shifted in the opposite direction, see image below.

Note that the Lag function is applied to the data in the order the data was loaded; the function does not take sorting in visualizations into account and any changes to the data (e.g., during reload) might result in different values for the various rows.

Examples:

Lag([Column])

Lag([Column], 3)

Last(Arg1)

Returns the last valid value based on the physical order of the rows of data in the argument column.

Example:

Last([Column])

LastValueForMax(Arg1, Arg2)

Returns the value of column 2 for the maximum value of column 1.

If there are more than one of the column 1 maximum value, then the result will be the value for the last max row. See also ValueForMax.

Example:

LastValueForMax([Column 1], [Column 2])

LastValueForMin(Arg1, Arg2)

Returns the value of column 2 for the minimum value of column 1.

If there are more than one of the column 1 minimum value, then the result will be the value for the last min row. See also ValueForMin.

Example:

LastValueForMin([Column 1], [Column 2])

LAV(Arg1)

Calculates the lower adjacent value.

Example:

LAV([Column])

Lead(Arg1, Arg2)

Shifts the values in a column upward by a specified number of steps. The first argument is the column to shift. The second (optional) argument is the number of steps. Default is 1.

If a negative number of steps is used, then the values are shifted in the opposite direction, see image below.

Note that the Lead function is applied to the data in the order the data was loaded; the function does not take sorting in visualizations into account and any changes to the data (e.g., during reload) might result in different values for the various rows.

Examples:

Lead([Column])

Lead([Column], 3)

LIF(Arg1)

Calculates the lower inner fence. This is the threshold located at Q1 – (1.5*IQR).

Example:

LIF([Column])

LOF(Arg1)

Calculates the lower outer fence. This is the threshold located at Q1 – (3*IQR).

Example:

LOF([Column])

Max(Arg1, ...)

Calculates the maximum value. If one argument is given, then the result is the maximum for the entire column. If more than one argument is given, then the result is the maximum for each row. The argument and the result are of type real. Null arguments are ignored.

Examples:

Max([Column])

Max(-1) → -1
Max (1.5, -2, 3) → 3
Max (1, null, 3) → 3
Max (null) → (Empty)

MeanDeviation(Arg1, ...)

Calculates the mean deviation value (average absolute deviation, AAD). If one argument is given, then the result is the mean deviation of all rows. If more than one argument is given, then the result is the mean deviation for each row.

Examples:

MeanDeviation([Column])

MeanDeviation(2,-3,4) → 2.67

Median(Arg1)

Calculates the median of the argument. If one argument is given, then the result is the median of all rows. If more than one argument is given, then the result is the median for each row.

Examples:

Median([Column])

Median(2,-3,4)

MedianAbsoluteDeviation(Arg1, ...)

Calculates the median absolute deviation value (MAD). If one argument is given, then the result is the median absolute deviation of all rows. If more than one argument is given, then the result is the median absolute deviation for each row.

Examples:

MedianAbsoluteDeviation([Sales])

MedianAbsoluteDeviation(2,-3,4)

Min(Arg1, ...)

Calculates the minimum value. If one argument is given, then the result is the minimum for the entire column. If more than one argument is given, then the result is the minimum for each row. The argument and the result are of type real. Null arguments are ignored.

Examples:

Min([Column])

Min(-1) → -1
Min (1.5, -2, 3) → -2
Min (1, null, 3) → 1
Min (null) → (Empty)

NormDist(Arg1)

Returns the (upper tail) normal p-value of the argument. If you do not specify them yourself, the default is mean=0 and standard deviation=1.

Example:

NormDist(x, mean, standard_dev)

NormDist(1.96) → 0.025

NormInv(Arg1)

Returns the (upper tail) normal quantile value of the argument. If you do not specify them yourself, the default is mean=0 and standard deviation=1.

Example:

NormInv(p, mean, standard_dev)

NormInv(0.025) → 1.96

NthLargest(Arg1, Arg2)

The nth largest value. The first argument is the column to analyze and the second argument is the value of n.

If n is larger than the number of values in the column, then the smallest value is returned.

Example:

NthLargest([Column], 10)

NthSmallest(Arg1, Arg2)

The nth smallest value. The first argument is the column to analyze and the second argument is the value of n.

If n is larger than the number of values in the column, then the largest value is returned.

Example:

NthSmallest([Column], 10)

Outliers(Arg1)

Outer value count. Calculates the count of values that are greater than the upper adjacent value or lower than the lower adjacent value.

Example:

Outliers([Column])

P10(Arg1)

The 10th percentile is the value at which 10 percent of the data values are equal to or lower than the value.

Example:

P10([Column])

P90(Arg1)

The 90th percentile is the value at which 90 percent of the data values are equal to or lower than the value.

Example:

P90([Column])

PctOutliers(Arg1)

Outer value percentile. Calculates the percent of values that are greater than the upper adjacent value or lower than the lower adjacent value.

Example:

PctOutliers([Column])

Percent(Arg1, Arg2)

The percent is the value calculated a certain percent above the minimum value within the value range (max value - min value). The first argument is the column to analyze and the second argument is the percent.

Example:

Percent([Column], 15.0)

Percentile(Arg1, Arg2)

The percentile is the value at which a certain percent of the data values are equal to or lower than the value. The first argument is the column to analyze and the second argument is the percent.

Example:

Percentile([Column], 15.0)

Q1(Arg1)

Calculates the first quartile.

Example:

Q1([Column])

Q3(Arg1)

Calculates the third quartile.

Example:

Q3([Column])

Range(Arg1)

The range between the largest and the smallest value in the column.

The result will be presented as a real or a timespan, depending on the data type of the argument.

Example:

Range([Column])

StdDev(Arg1)

Calculates the standard deviation.

Example:

StdDev([Column])

StdErr(Arg1)

Calculates the standard error.

Example:

StdErr([Column])

TDist(Arg1)

Returns the (upper tail) t p-value of the argument.

Example:

TDist(x, deg_freedom)

TDist(4.302653, 2) → 0.025

TERR_Binary

Calls the TIBCO Enterprise Runtime for R engine and returns an output of the specified data type, containing the same number of rows as the input.

The first argument is a script and the following arguments are the arguments to the script.

The column returned must have the same number of rows as the input. At least one argument other than the script is required. The inputs will be placed in variables called input1, input2, ... inputN, and the output must be placed in a variable called output.

Examples:

TERR_Real("output <- input1*100 + input2", [Record No], [Sales])

TERR_String("output <- input1", [String Column])

TERR_Boolean

See TERR_Binary above.

TERR_DateTime

See TERR_Binary above.

TERR_Integer

See TERR_Binary above.

TERR_Real

See TERR_Binary above.

TERR_String

See TERR_Binary above.

TERRAggregation_Binary

Calls the TIBCO Enterprise Runtime for R engine and returns an output of the specified data type. The first argument is a script and the following arguments are the arguments to the script.

The script should return a single aggregated value. At least one argument other than the script is required. The inputs will be placed in variables called input1, input2, ... inputN, and the output must be placed in a variable called output.

Examples:

TERRAggregation_Real("output <- median(input1) + median(input2)", [X], [Y])

TERRAggregation_String("output <- input1[1]", [Customer Name])

TERRAggregation_Boolean