Modifying the SparkR sources to use a specified engine

The SparkR package we tested assumes that worker nodes can call the R engine using the hard-wired command Rscript. To use SparkR with TIBCO Enterprise Runtime for R (TERR), we needed to change SparkR sources so it can call a different engine instead.

Perform this task using a code editor on a computer that meets the prerequisites.

Prerequisites

Procedure

  • Change the SparkR source code to allow a different command to invoke the engine by making the following change to the file SparkR-pkg/pkg/src/src/main/ scala/edu/berkeley/cs/amplab/ sparkr/RRDD.scala.
    Change
     private def createRProcess(rLibDir: String, port: Int, script: String) = {
        val rCommand = "Rscript"
        val rOptions = "--vanilla"
    ...
    to
    private def createRProcess(rLibDir: String, port: Int, script: String) = {
        val rCommand = SparkEnv.get.conf.get("spark.sparkr.r.command",
                                             "Rscript")
        val rOptions = "--vanilla"
        ...
    Note: At some point, we hope that we can contribute this change to the SparkR sources.

What to do next

Build, configure, and test SparkR with TERR.