Variable Specification for ETL Time Indexed Analysis

On the STATISTICA ETL Advanced tab, click the Variable specs button to display the Variable specification dialog box, which provides data cleaning and aggregation options for each selected output variable.

Element Name Description
Minimum permissible value Converts any value less than the specified floating point number to missing data. This data cleaning method applies to continuous variables only
Maximum permissible value Converts any value greater than the specified floating point number to missing data. This data cleaning method applies to continuous variables only
Maximum run length for same value (invariance) Converts any run (i.e., series of repeated consecutive values) longer than the specified integer value to missing data. This data cleaning method applies to continuous variables only
Aggregation statistics type Summarizes data by central tendency (mean, median, mode), variation (std. dev.), range (minimum, maximum), or total (sum). Select a field, and click the down arrow to display the following commands:
  • Mean: The number calculated by adding a group of numbers and then dividing by the count of those numbers. This function is the default for continuous variables and applies to continuous variables only
  • Median: The middle number of a group of numbers; that is, half the numbers have values that are greater than the median, and half the numbers have values that are less than the median. This function applies to continuous variables only
  • Sum: The total of a series of numbers. This function applies to continuous variables only
  • Minimum: The smallest number in a set of numbers. This function applies to continuous variables only

  • Maximum: The largest number in a set of numbers. This function applies to continuous variables only
  • Mode: The most frequently occurring value in a group of values. In case of a tie, the first value is selected. This function is the default for categorical variables and applies to categorical variables only
  • Std. dev. (Standard Deviation): A measure of how widely values are dispersed from the average value (the mean). This function applies to continuous variables only
  • First: Takes the first value in a group
  • Last: Takes the last value in a group
If no data found in interval, replace with Replaces blank records with one of three types of user-specified values. This data cleaning method applies to both continuous and categorical variables:
  • Missing Data: Displays the value of the Missing Data (MD) code for a given variable as specified in the STATISTICA Spreadsheet Variable Specifications dialog. The global Default Missing Data Value (i.e., -999999998) is set on the Options dialog - Spreadsheets tab accessible from the Tools menu
  • Duplicate of previous aggregate: Displays the modified value (based on selected Aggregation statistics type) of the previous observation
  • Duplicate of previous observation: Displays the actual value of the previous observation