Random Sampling (HD)
Extracts data rows from the input data set and generates sample tables/views according to the sample properties (percentage or row count) the user specifies.
Information at a Glance
Category | Sample |
Data source type | HD |
Sends output to other operators | Yes |
Data processing tool | MapReduce |
The Random Sampling (HD) operator for is for Hadoop data only. For database data, use the Random Sampling (DB) operator.
Configuration
Parameter | Description |
---|---|
Notes | Any notes or helpful information about this operator's parameter settings. When you enter content in the Notes field, a yellow asterisk is displayed on the operator. |
Number of Samples | The number of samples to generate. The samples are in the form of Hadoop files. For example, if the user inputs 3 in this field, 3 sample files are generated. |
Sample By | The size of samples by Percentage or by Number of Rows. |
Sample Size | The number of rows to generate for each sample data set. This property is interpreted in conjunction with the
Sample By property.
|
Random Seed | The seed used for the pseudo-random row extraction. |
Consistent | Determines whether the operator always creates the same set of random rows for each sample data generation.
|
Replacement | Specifies that one row of data can be selected multiple times.
If set to true, then both the Consistent and Disjoint properties are set to false and disabled. |
Disjoint | Specify whether each sample should be drawn from the entire data set, or from the remaining rows after previous samples are excluded.
If set to true, then Replacement must be false. |
Store Results? | Specifies whether to store the results. |
Results Location | The HDFS directory where the results of the operator are stored. This is the main directory, the subdirectory of which is specified in Results Name. Click Choose File to open the Hadoop File Explorer Dialog Box and browse to the storage location. Do not edit the text directly. |
Results Name | The name of the file in which to store the results. |
Overwrite | Specifies whether to delete existing data at that path and file name. |
Compression | Select the type of compression for the output.
Available Avro compression options are the following. |
Copyright © Cloud Software Group, Inc. All rights reserved.