Installing and downloading the required components

Before you configure TIBCO Enterprise Runtime for R to work with SparkR, you must download the sources for the SparkR package, and you must install Hadoop and Spark.

We have tested the configuration on the versions listed in these instructions.

Perform this task from a browser on a computer that meets the requirements for running Hadoop with Yarn, Spark, and TERR.

Prerequisites

You must have installed open-source R.
You must have installed TIBCO Enterprise Runtime for R.
You must be able to install Hadoop and Spark.
You must be able to download SparkR sources.

Procedure

Install Hadoop 2.6.0 (with Yarn).
1. Browse to hadoop.apache.org.
2. Follow the instructions to install Hadoop with Yarn.
We have tested this configuration with Hadoop 2.6.0.
Install Spark 1.3.0.
1. Browse to spark.apache.org.
2. Follow the instructions to install Spark.
We have tested this configuration with SparkR 1.3.0.

Download the sources for SparkR.

Browse to https://github.com/amplab- extras/SparkR-pkg/.

Download the sources using git.

For our test, we pulled the sources for the master branch with the last change as follows:

commit 2167eec8187e3a10b08e3328ed6c2b5fc449edde
    Merge: a5eb4fd 1d6ff10
    Author: Zongheng Yang <zongheng.y@gmail.com>
    Date:   Tue Apr 7 23:14:41 2015 -0700
        Merge pull request #244 from sun-rui/SPARKR-154_5
        [SPARKR-154] Phase 4: implement subtract() and subtractByKey().

What to do next

Modify the SparkR sources to not specify a hard-wired command to Rscript.