Random Sampling (DB)
Extracts data rows from the input data set and generates sample tables/views according to the sample properties (percentage or row count) the user specifies.
Information at a Glance
The Random Sampling (DB) operator is for database data only. For Hadoop data, use the Random Sampling (HD) operator.
Configuration
Parameter | Description |
---|---|
Notes | Any notes or helpful information about this operator's parameter settings. When you enter content in the Notes field, a yellow asterisk is displayed on the operator. |
Number of Samples | The number of samples to generate. The samples are in the form of either database tables or views. For example, if the user inputs 3 in this field, 3 sample tables/views are generated. |
Sample By | The size of samples by Percentage or by Number of Rows. |
Sample Size | The number of rows to generate for each sample data set. This property is interpreted in conjunction with the
Sample By property.
|
Random Seed | The seed used for the pseudo-random row extraction. |
Consistent | Determines whether the operator always creates the same set of random rows for each sample data generation.
Default value: false. |
Replacement | Specifies whether this is sampling with or without replacement.
If Replacement is selected, both the Consistent and Disjoint properties are set to false and disabled. |
Disjoint | Specify whether each sample should be drawn from the entire data set, or from the remaining rows after previous samples are excluded.
If set to true, then Replacement must be false. |
Key Columns | Used in conjunction with the
Consistent property.
|
Output Schema | The schema for the output table or view. |
Output Table | The table path and name where the results are output. By default, this is a unique table name based on your user ID, workflow ID, and operator. |
Storage Parameters | Advanced database settings for the operator output. Available only for
TABLE output.
See Storage Parameters Dialog Box for more information. |
Drop If Exists | Specifies whether to overwrite an existing table. |
Output
- Visual Output
- The data rows of the output table/view of each generated sample displayed (up to 2000 rows of the data).
- Data Output
- A data set of sample data tables created. Typically, the data set is passed on to a Sample Selector operator, such as Train and Test, to select a sample to use with subsequent operators.