R Execute (HD)

To configure R Execute, connect a valid data source to the R Execute operator. An intermediate operator also constitutes a data source for R Execute.

Information at a Glance

Category Tools
Data source type HD
Sends output to other operators No
Data processing tool R engine

R Execute (HD) is for Hadoop data only. For database data, use the R Execute (DB) operator.

For information about configuring and using this operator, see R Execute.

Input

You specify that you require the input dataset by referring to an R object called alpine_input in the script. This is a data frame object.

You might choose not to use the input dataset (by not referring to alpine_input in the script), in which case the data is not read in to R.

  • If the input is a preceding operator, the preceding operator runs, but the data is not transferred to R if you do not use alpine_input in the script.
  • If the preceding operator is a data source (a Hadoop file or the database table), the Hadoop data transfer or the database query does not run, saving execution time.
Note: Some kind of a data source must be specified, even if you do not use the input dataset in the R code. This is because you might still generate an output data frame, and the data frame must be stored in the same data store type as the one selected for the input (that is, Hadoop output for a Hadoop input, database output for a database input). Also, it must be the same specific data store (same Hadoop cluster or same database).


Restrictions

See the topic "R Execute Prerequisites" in TIBCO® Data Science Team Studio Installation and Administration for information about package, system, and server requirements.

Configuration

Notes Any notes or helpful information about this operator's parameter settings. When you enter content in the Notes field, a yellow asterisk is displayed on the operator.
R Script

The R script to execute. Select Define Clause to specify the R-Script.

Results Location

Specifies the HDFS directory where the results of the R Execute operator are stored. This is the main directory, the subdirectory of which is specified in the Results Name option (see below).

  • Click Choose File to specify the location. Do not edit the text directly.
Results Name Select the name of the Hadoop file where the results of the R Execute operator are stored.
Overwrite

Determines whether the operator should overwrite an existing file if a file with the same name exists.

Default value: Yes

Pass Output File

Specifies whether to pass the R Execute output to the next operator.

  • If set to Yes, then the Results File Structure details must be configured.
  • Because Pass Output File is not required to be set to Yes if R Execute is a terminal operator, Results File Structure would not have to be provided, in case you set Pass Output File to No.

Default value: No.

Results File Structure

Specifies the file structure of the operator's output to pass to the next operator (if Pass Output File is set to Yes).

For more information, see Results File Structure Dialog Box.

Output

Visual Output
The table is stored whether Pass Output File is set to Yes, and whether the Results File Structure is provided. The Pass Output Fie and Results File Structure combination is used only to check the integrity of the flow in case the R Execute operator is followed by another operator.

If the alpine_output object does not exist in your R code, then the output is not generated. If you set Pass Output File to Yes, and if the R script does not contain a reference to the alpine_output object, then the flow fails at runtime, and an error message is displayed for this inconsistency.

Data Output



If you create an alpine_output object in the R code, then the R Execute operator output displays the output data frame (persisted in the HDFS/MapR file structure) in the results console in the Data tab.

R-Console Output



If your R code included functions that printed output to the R console, then the output is displayed in the results console in the R-Console Output tab.
Note: The R Execute operator's console printing behavior is a bit different from the R console or RStudio behavior. Normally, if you have a statement in the R code that reads summary(alpine_input), it is shown in the R console or RStudio console. However, because R Execute is capturing the console output into an object (which is executed in R using R's capture.output function), you must wrap such calls using the print() function. For example, instead of summary(alpine_input), specify print(summary(alpine_input)). This is a limitation of how R's capture.output function works.
R-Script



Your R code is shown in the results console in the R-Script tab.