Using the Correlations Operator

Introduction

The TIBCO Streaming® Matrix Operator is used to gather tuples over various styles of output types such as over time or by selected values. The purpose of this operator is to create a matrix (list of tuples) of which the tuples fields are the columns of the matrix.

Correlations Properties

This section describes the properties you can set for this adapter, using the various tabs of the Properties view in StreamBase Studio.

General Tab

Name: Use this required field to specify or change the name of this instance of this component, which must be unique in the current EventFlow module. The name must contain only alphabetic characters, numbers, and underscores, and no hyphens or other special characters. The first character must be alphabetic or an underscore.

Operator: A read-only field that shows the formal name of the operator.

Class name: Shows the fully qualified class name that implements the functionality of this adapter. If you need to reference this class name elsewhere in your application, you can right-click this field and select Copy from the context menu to place the full class name in the system clipboard.

Start options: This field provides a link to the Cluster Aware tab, where you configure the conditions under which this adapter starts.

Enable Error Output Port: Select this check box to add an Error Port to this component. In the EventFlow canvas, the Error Port shows as a red output port, always the last port for the component. See Using Error Ports to learn about Error Ports.

Description: Optionally enter text to briefly describe the component's purpose and function. In the EventFlow Editor canvas, you can see the description by pressing Ctrl while the component's tooltip is displayed.

Operator Properties Tab

Property Description
Correlation type Select the type of correlation (Pearson, Spearman, or Kendall) to compute for each pair of selected variables.
Missing data deletion If casewise missing data deletion is selected, then all rows with missing data in at least one of the selected variables will be excluded from the analysis, otherwise if pairwise is selected, each correlation between a pair of variables will include all cases with valid data on both of those variables.
Log Level Controls the level of verbosity the adapter uses to send notifications to the console. This setting can be higher than the containing application's log level. If set lower, the system log level is used. Available values, in increasing order of verbosity, are: OFF, ERROR, WARN, INFO, DEBUG, TRACE.

Field Select tab

The Missing Data tab is used to check incoming tuples against an expression; if the expression evaluates to true then replace the field with the given replacement value. This missing data replacement check is done before the input tuple is evaluated further inside the operator. By default, no field values are replaced. You can use the smart fill option to evaluate the current input schema and fill the table with reasonable values.

Property Description
List 1 Specify the selected list of variables for List 1. Regular expression matching is supported. If only a single list of p variables is specified and no variables are specified for List 2, then the operator will compute all pairwise correlations between each pair variables within List 1 (for a total of p2correlations).
List 2 Specify the selected list of variables for List 2. Regular expression matching is supported. If p variables are specified for List 1 and q variables are specified for List 2, then the operator shall compute all pairwise correlations between lists of variables, for a total of pqcorrelations.

Cluster Aware Tab

Use the settings in this tab to allow this operator or adapter to start and stop based on conditions that occur at runtime in a cluster with more than one node. During initial development of the fragment that contains this operator or adapter, and for maximum compatibility with TIBCO Streaming releases before 10.5.0, leave the Cluster start policy control in its default setting, Start with module.

Cluster awareness is an advanced topic that requires an understanding of StreamBase Runtime architecture features, including clusters, quorums, availability zones, and partitions. See Cluster Awareness Tab Settings on the Using Cluster Awareness page for instructions on configuring this tab.

Concurrency Tab

Use the Concurrency tab to specify parallel regions for this instance of this component, or multiplicity options, or both. The Concurrency tab settings are described in Concurrency Options, and dispatch styles are described in Dispatch Styles.

Caution

Concurrency settings are not suitable for every application, and using these settings requires a thorough analysis of your application. For details, see Execution Order and Concurrency, which includes important guidelines for using the concurrency options.

Operator Ports

The operator expects that the variables or fields to be analyzed are of type 'double'. The output tuple will consist of the incoming data passed through along with a list of the following analytic results for each pair of variables analyzed:

  1. Name of first variable

  2. Name of second variable

  3. Correlation (Pearson, Spearman, or Kendall)

  4. Test statistic

  5. Degrees of freedom

  6. P-value associated with testing the hypothesis of no correlation between the two variables