Data Science Operator Sample Group

The data science operator sample group provides one small EventFlow application for each operator and most data constructs in the Palette view in StreamBase Studio.

  • An operator is a StreamBase processing unit that performs predefined work on streaming data, such as aggregating windows of data streams, merging streams, or retrieving shared data from a table.

  • A data construct is a component that can store information from a stream or from an external data source that can then be used by an associated Spotfire Streaming operator.

Each sample has a separate README that describes the steps to run that sample.

Component Sample Description
Data Science Operator Samples
ANOVA Operator Sample Uses an ANOVA Operator to compute the analysis of variance which is a generalization of the t-test for comparing two or more groups with respect to equality of means.
Chi-square Test Operator Sample Uses Chi-square Test of Independence Operator to compute the chi-square test of independence between two categorical/discrete random variables along with other relevant summary information such as crosstabulation frequencies, relative frequences, etc. as well as the Cramer's V statistic.
Predictive Modeling Sample: Classification Trees Uses Classification Trees Operator to build classification tree models. The IRIS Flower data - irisdat.csv. SEPALLEN, SEPALLWID, PETALLEN, PETALWID features are selected as predictors. IRISTYPE is selected as response.
Correlations Operator Sample Uses Correlations Operator to gather tuples over various styles of output types such as over time or by selected values. The purpose of this operator is to create a matrix (list of tuples) of which the tuples fields are the columns of the matrix.
Descriptive Statistics Operator Sample Uses Descriptive Statistics Operator to provide basic statistical information for each specified variable including measures of central tendency (e.g. mean) and of dispersion (e.g. standard deviation).
Frequency Tables Operator Sample Uses a Frequency Tables Operator to compute contingency table that shows item and combination counts.
Kolmogorov-Smirnov Two Sample Test This sample uses an Kolmogorov-Smirnov Test Operator to compute the two-sample Kolmogorov-Smirnov test. This is the nonparametric analogue to the two-sample t-test, however, instead of comparing means between two groups, the test can be used to assess any differences between the two distributions.
Predictive Modeling Sample: Linear Regression Uses a Linear Regression Operator to build linear regression models. Ordinary least square, ridge regression, and lasso regression models are supported.
Predictive Modeling Sample: Logistic Regression Uses a Logistic Regression Operator to build binary logistic regression models.
Predictive Modeling Sample: Multilayer Perceptron Classification Uses a Multilayer Perceptron Classification Operator to build multilayer perceptron neural networks. It uses the IRIS Flower data - irisdat.csv. SEPALLEN, SEPALLWID, PETALLEN, PETALWID features are selected as predictors. IRISTYPE is selected as a response.
Predictive Modeling Sample: Multilayer Perceptron Regression Uses a Multilayer Perceptron Regression Operator to build multilayer perceptron neural networks. It uses the Boston Housing 2 data - BostonHousing2.csv. ValueofOccupiedHomes is selected as the response. The rest is selected as predictors.
Paired T-test Sample Uses a Paired T-Test Operator to compute the two sample dependent t-test where a two sample t-test is used to test the null hypothesis that the population means of two dependent groups as measured on a single variable are significantly different from one another.
Predictive Modeling Sample: Regression Trees Uses a Regression Trees Operator to build regression tree models. These operator starts taking data from the feed simulation and emitting the results after 300 rows collected.
Single Sample T-Test Operator Uses a Single Sample T-Test Operator to compute the single sample t-test.
Predictive Modeling Sample: Support Vector Machine Classifier Uses a SVM Classification Operator to build support vector machine classification models.
Predictive Modeling Sample: Support Vector Machine Regression Uses a SVM Regression Operator to build support vector machine regression models.
Two Sample T-test Sample Uses Two Sample T-Test Operator to compute the two sample independent t-test where a two sample t-test is used to test the null hypothesis that the population means of two groups as measured on a single variable are significantly different from one another.
Two Sample T-Test by Groups Operator Uses T-Test By Groups Operator to compute the two sample independent t-test where a two sample t-test is used to test the null hypothesis that the population means of two groups as measured on a single variable are significantly different from one another.