Installing and downloading the required components

Before you configure TIBCO Enterprise Runtime for R to work with SparkR, you must download the sources for the SparkR package, and you must install Hadoop and Spark.

We have tested the configuration on the versions listed in these instructions.

Perform this task from a browser on a computer that meets the requirements for running Hadoop with Yarn, Spark, and TERR.

Prerequisites

  • You must have installed open-source R.
  • You must have installed TIBCO Enterprise Runtime for R.
  • You must be able to install Hadoop and Spark.
  • You must be able to download SparkR sources.

Procedure

  1. Install Hadoop 2.6.0 (with Yarn).
    1. Browse to hadoop.apache.org.
    2. Follow the instructions to install Hadoop with Yarn.
    We have tested this configuration with Hadoop 2.6.0.
  2. Install Spark 1.3.0.
    1. Browse to spark.apache.org.
    2. Follow the instructions to install Spark.
    We have tested this configuration with SparkR 1.3.0.
  3. Download the sources for SparkR.
    1. Browse to https://github.com/amplab- extras/SparkR-pkg/.
    2. Download the sources using git.
      For our test, we pulled the sources for the master branch with the last change as follows:
      commit 2167eec8187e3a10b08e3328ed6c2b5fc449edde
          Merge: a5eb4fd 1d6ff10
          Author: Zongheng Yang <zongheng.y@gmail.com>
          Date:   Tue Apr 7 23:14:41 2015 -0700
              Merge pull request #244 from sun-rui/SPARKR-154_5
              [SPARKR-154] Phase 4: implement subtract() and subtractByKey().

What to do next

Modify the SparkR sources to not specify a hard-wired command to Rscript.