Stratified Random Sampling

This node computes and creates a new output data spreadsheet as a stratified random sample of the input data. One or more stratification variables can be selected, and the user may specify either (sampling) percentages or approximate numbers of cases for each stratum. Constant sampling rates for all strata and additional sub-setting of variables and/or cases can also be requested.

 Use this option to systematically over-sample rare events, for example, for predictive classification projects.

General

Element Name Description
Sampling rates Select stratified sample with same sampling rate in each stratum, or use different sampling rates for each stratum.
Subset variables This node will create a data spreadsheet for further subsequent analyses. You can either create a subset of the variables, retaining only those that were selected for the analyses (select Yes), or you can 'carry along' all variables from the original input data source.
Subset cases Select whether or not to apply the case selection conditions (if they exist) for the input data file, prior to drawing the stratified random sample. Select No to draw the sample from all cases (observations); select Yes to draw the sample from selected (via the case selection conditions) cases only.
Random numbers algorithm Specify whether or not to use the fast random number generator to select the sample; this is sufficient in most practical applications; choose option Careful random numbers, to use the standard DIEHARD certified random number generator in Statistica.

Common sampling rate

Element Name Description
Percent or number of cases Determines whether to use the percentage value or the approximate number of cases, to define sampling rates. The parameters on this tab are only applicable, when the Sampling rates parameter is set to Common (same sampling rate for all strata).
Percent of cases The percentage of cases to extract (sample) from the input file; the same sampling rate will be applied to all strata. Only applicable if sampling a percentage of cases.
Number of cases The approximate number of cases to extract (sample) from the input file; the same sampling rate will be applied to all strata. Only applicable if sampling a specific number of cases.

Different sampling rate

Element Name Description
Percent or number of cases Determines whether to use the percentage value or the approximate number of cases, to define sampling rates. The parameters on this tab are only applicable, when the Sampling rates parameter is set to Different rates (for each stratum)
Percent of cases The percentages of cases to extract (sample) from each stratum in the input file; only integer values are allowed. Only applicable if sampling a percentage of cases.
 To specify separate sampling rates for different strata, specify the respective percentages for each stratum, separated by blanks; the number of sampling percentages must match the number of strata as specified in the stratification variables (categorical predictors).
 For multiple stratification variables specify the sample N's as a single long array of numbers. For example, if two stratification variables exist, these values should be specified in the order (StratVar1:StratVar2) 1:1 1:2 1:3 2:1 2:2 2:2 2:3..; e.g. if 2 variables defined 2 x 2 = 4 strata or groups, to return approximately 10% of cases for each, the values should be entered as '10 10 10 10' into this edit field.
Numbers of cases The number of cases to extract (sample) from each stratum in the input file; only integer values are allowed. Only applicable if sampling specific numbers of cases.
 To specify separate sampling rates for different strata, specify the respective N for each stratum, separated by blanks; the number of sample N's must match the number of strata as specified in the stratification variables (categorical predictors).
 For multiple stratification variables, specify the sample N's as a single long array of numbers. For example, if two stratification variables exist, these values should be specified in the order (StratVar1:StratVar2) 1:1 1:2 1:3 2:1 2:2 2:2 2:3..; e.g. if 2 variables defined 2 x 2 = 4 strata or groups, to return approximately 10 cases for each, the values should be entered as '10 10 10 10' into this edit field.