Sampling Operators
Sampling (Sample) operators provide ways to obtain a sample of a source dataset.
A model is typically created using a training dataset and then tested against a validation dataset. This is achieved in Team Studio by sampling the source data.
Team Studio provides the following primary Sampling operators:
The third sampling operator, Sample Selector, follows either the Random Sampling or Stratified Sampling Operator to pass one of the generated sample datasets to succeeding operators in the workflow, as shown below.
- Random Sampling (DB)
Extracts data rows from the input data set and generates sample tables/views according to the sample properties (percentage or row count) the user specifies. - Random Sampling (HD)
Extracts data rows from the input data set and generates sample tables/views according to the sample properties (percentage or row count) the user specifies. - Resampling
Changes the distribution of values in a single column. You can use this operator to either balance all values in the selected column or change the proportion of only one value. You can use it to up-sample or down-sample. - Sample Selector
Connects to a preceding sample-generating operator (for example, the Random Sampling operator) and allows you to specify one of the sample data sets generated from that operator for use in succeeding operators. - Stratified Sampling
Extracts data rows from the input data set and generates sample tables/views according to the sample properties specified by users.
Copyright © Cloud Software Group, Inc. All rights reserved.