R Execute

You can use R Execute in a seamless extension of a Team Studio Workflow to include any existing R-coded model.

Features

You can write the R script and run it against an input dataset stored in any database supported by Team Studio, without writing SQL queries, knowing the SQL syntax differences between Oracle, PostgreSQL, and so on.

It allows Team Studio operators to precede and/or follow any valid R model. These can include things such as column and row filters (pre or post R), histograms (post R), and the input data sources themselves (database table or Hadoop delimited file).

You can write the R script and run it against an input dataset stored in Hadoop data store supported by Team Studio (for example, HDFS in case of CDH4/5, Pivotal HD 2.x, Hortonworks and Apache, and MapR file system in case of MapR), without having to deal with the complexity and security issues of accessing these data stores directly.

Important: This design also addresses the issue of having to access different clusters at once, with different Hadoop and therefore HDFS/MapR FS versions.

The data is pulled into R only if the input data frame is specified, saving time if you want to pull the data from somewhere else (for example, the web, bypassing the input), or want to ignore the input and simply generate output (for example, Monte Carlo simulation, random number generation, and so on). See below for usage of the input data frame functionality in your R script.

If you specify an output data frame, it can be stored in any database or Hadoop data store supported by Team Studio without you being concerned about how to do it. See below for usage of the output data frame functionality in your R script.

Note: You do not need direct database permissions. You do not need to know how to write SQL queries to create a new table or to insert data. You do not need to know how to do efficient batch insertions and manage transactions.

Requirements

You must have the R Connector installed on a server in your Team Studio deployment. For more information, see your system administrator.

Important

The R Execute Operator is a batch operator.
  • You cannot interactively run an R command or function, see what the interpreter generates, and then have the state of the session stored in memory in order to perform the next experiment. In this sense, the R Execute operator differs from an R session in an R shell/interpreter/REPL (read-evaluate-print loop) or the RStudio IDE.
  • Note that if you run an R script that expects interactive user input, the operator hangs, because R waits for user input. Any function that is interactive must not be used in the script to be run by R Execute. This includes the interactive mode of install.packages, which requires you to interactively specify the repository from which to get the package. In case of install.packages, this can be avoided by specifying the repository explicitly - this prevents R from waiting for interactive user input.
    install.packages(pkgs = c('dplyr'),
    repos = c('http://cran.cnr.berkeley.edu/'))
  • If you inadvertently include an interactive function in the code, the operator hangs as described above, but the flow can be stopped with the Stop link in the Team Studio user interface, which stops the entire workflow. If you expect the result to return quickly, but the operator is stuck for a very long time, stop the flow and check for interactive user code.
  • You must write valid R code and see the results of its execution in the console output and the resulting data frame.
  • Try the R code on a very small input dataset first to test the logic.
  • Writing code incrementally, not creating an output data frame (see below), and printing to the console means you can debug the code quickly, unless it has already been debugged in RStudio.
  • You can visualize the console output in Team Studio; however, Team Studio does not allow you to visualize plots using R code. For plot visualization, Team Studio users should pass the output from the R Execute operator to a subsequent operator, or link with Spotfire or Tableau. R Execute generates helpful messages to identify issues, including R code syntax errors, data type mismatches, and so on. See R Execute error messages for more information.

Clean your data before you run the R Execute operator. By design, the R Execute operator does not do any data cleaning; it uses the specified data as is. If you know that your data is not clean (for example, the header is included in the data, there are missing or incorrect values, and so on), use other Team Studio operators or R commands to clean your data.

The Team Studio product can be extended for use with the R language and environment for statistical computing and graphics (see https://www.r-project.org/) through use of the R Connector for Team Studio (the "R Connector"), which is subject to free open-source software license terms and is available on GitHub.

The R Connector is not part of the Team Studio product and therefore not within the scope of your license for the product. Accordingly, the R Connector is not covered by the terms of your agreement with the Team Studio product, including any terms concerning support, maintenance, or warranties. Download and use of the R Connector is solely at your own discretion and subject to the free open-source license terms applicable to R Connector.

Similarly, the R language and environment for statistical computing and graphics and related packages (the "R Engine and R Packages") are not within the scope of your license for the product. Accordingly, the R Engine and R Packages are not covered by the terms of your agreement with the Team Studio product, including any terms concerning support, maintenance, or warranties. Download and use of any R Engine or R Packages are solely at your own discretion and subject to the free open-source license terms applicable to the same. TIBCO bears no responsibility for the accuracy of R algorithms, bugs in R Packages, the stability of the R Engine, the logic of the user's R code, or the license implications of R itself. Please note that the R Engine is licensed under GNU General Public License (GPL), versions 2 and 3. R Packages are licensed under various licenses, including, but not limited to, GPL, Affero GPL (AGPL), BSD 2-clause and 3-clause licenses, the Artistic License, and the MIT license (see here for details). If you have any questions about such open source licenses, consult a software license lawyer for advice.