Statistica ETL: ID Based Advanced Tab

Select the Advanced tab of the Statistica ETL: ID-based dialog box to access the options described here. This tab contains items from the Quick tab plus less commonly used ID-Based STATISTICA ETL options.

The grid at the top lists the data sources and their associated properties such as data source ID (for recording/editing SVB macros), data type, and name. When your cursor hovers over a data source in the grid, a ToolTip displays the full path of the data source location.

Element Name Element Type Description
Add data source Button Click this button to display the Select Data Sources dialog box, where you can specify a data source to add to the list
Remove data source Select a data source in the grid at the top of the tab and click this button to remove that data source from the grid
Note: To change the sequence in which the data sources are aligned/merged, select a data source and click the arrow buttons at the side of the grid. Click the down arrow to demote the selected source; click the up arrow to promote the selected source
Options/properties applicable to selected data source above These options apply to the data source selected in the grid
Variables Button Click this button to display the Select Variables dialog box, where you select ID and output variables. If you also select one or more optional Time variable(s), then the Time Variable Specs button is enabled
Variable specs Button Click this button to display the Variable Specification dialog box, which provides data cleaning and aggregation options for each selected output variable. Adjacent to this button, the word default is displayed if no output specifications have been changed
Time Variable Specs Button Available only if a Time variable was selected. Click this button to display the Time Variable Specification dialog box, which provides data cleaning and aggregation options for each selected Time variable. Adjacent to this button, the word default is displayed if no Time specifications have been changed
Note: Assume data is sorted ascending by ID variable (merge step is faster). Select this check box to significantly decreased runtime for very large data sets that are pre-sorted ascending by the Identifier variable. If the data are not sorted ascending by the Identifier variable, a warning message displays.
Use input data case selection conditions Check box Select this check box to specify that Statistica uses a subset of cases as defined in the input spreadsheet conditions. Click the Edit button to display the Analysis/Graph Case Selection Conditions dialog box
Use variable prefix Check box When this check box is selected, variables in the output contain a default prefix, which is the data source name. The prefix can be changed in the adjacent edit box
Only use when sources have duplicate variable names Check box When this check box is selected, only duplicate variables contain a prefix
Merge properties for all data sources Data sources are merged by matching on the selected Identifier variables and optional time variables
Preserve order in data Check box Select this check box to retain the original Identifier (i.e., Class ID) order. Merge results are sorted by the order of identifiers in first data source , then by the order of identifiers in each subsequent data source as numbered in the ID column of the Data source grid
Unmatched cases These options specify how unequal numbers of cases are handled
  • Fill with MD: Select this option to pad unmatched cases with missing data. This is the default option
  • Delete cases: If this option button is selected, cases from input data sources that cannot be matched are removed from the results spreadsheet
  • Generate Cartesian: Select this option to create a cross product between every unmatched case against every other case (i.e., if a unique case is found in only data source 1 or data source 2), then every combination of that case against every other case are created
  • Abort merge: When this option button is selected, the presence of unmatched cases in any data source causes an error message to be displayed and the merge procedure to be abandoned. Note that the Abort merge option only works when the variables specifications for all data sources declare an Aggregation statistic type of None
Multiple Cases These options specify how duplicate matching cases are handled
  • Fill with MD: Select this option to pad duplicate matched cases with missing data. This is the default option
  • Copy down: Select this option to generate a Cartesian product for duplicate matches of the same value

For more information on descriptions of buttons, see Statistica ETL: ID-Based - Startup Panel and Quick Tab.

For more information, see also Statistica Extract, Transform, and Load (ETL) Overview.