The Spotfire Enterprise Runtime for R Parallel Package Overview

The parallel package contains a subset of functions to provide compatibility for the open-source R parallelized computing feature. Using the the Spotfire Enterprise Runtime for R parallel package, you can:

Define and create a cluster of Spotfire Enterprise Runtime for R computation nodes, either locally (multiple cores on a single machine) or remotely (on multiple machines running Spotfire Statistics Services).
Execute a parallelized computation on a cluster using one of a family of parallelized apply functions.

The parallel package implements several "dummy" functions. These are functions that exist only for compatibility with open-source R parallel functions.

To use the parallel package, you must have set JAVA_HOME. You can check for JAVA_HOME by running the command Sys.getenv("JAVA_HOME") in the Spotfire Enterprise Runtime for R console.

Running library(parallel) loads the terrJava package if it is not already loaded.

Spotfire Statistics Services allocates the parallel nodes you create to its available engines. For example, if you are running Spotfire Statistics Services with three engines, but you create a cluster with more than three nodes, Spotfire Statistics Services allocates the nodes to the engines. You can submit as many tasks as there are virtual nodes, but they are allocated to engines according to their availability.

If you use the parallel package with Spotfire Statistics Services, remember that each call to the server starts a new engine session. You cannot depend on a particular engine being used from one call to another. Each individual call could (and probably would) use a different engine (or to an entirely different machine). If you want to do some set up and an evaluation, write the script as one single evaluation.

Warning: Clean Up Spawned Parallel Engines

When makeCluster (with type="TERR") creates a cluster of spawned engines, these processes remain until they are explicitly stopped by calling stopCluster, or the Spotfire Enterprise Runtime for R process that spawned them exits.

This can cause problems if you call makeCluster repeatedly in a long-running Spotfire Enterprise Runtime for R engine, such as a Spotfire local Spotfire Enterprise Runtime for R engine, or an engine in Spotfire Statistics Services that is reused to execute multiple Spotfire Statistics Services tasks. In this case, you could create many spawned engine processes, which could ultimately slow down the computer.

A good way to avoid this problem is to be sure to call stopCluster after the cluster is used, with code such as the following. (It uses tryCatch so it is sure to stop the cluster, even if an error occurs when computing with the cluster.) Calling on.exit in a function could also be used.

clust <- makeCluster(3) tryCatch(val <- clusterApply(clust, mylst, myfun), finally=stopCluster(clust))

You can find the following functions in the parallel package. For more information on each function, see the package help.

Create parallel nodes

The following function creates a parallel node.

makeCluster

Creates a cluster. Use the spec argument to specify the number of nodes.

Perform parallel computation

The following functions perform various computation chores on clusters.

clusterApply	Applies the specified function to the components of x on each node.
clusterApplyLB	Similar to clusterApply, but with load balancing.
clusterCall	Calls the specified function on each cluster node.
clusterEvalQ	Evaluates a lteral expression on each cluster node
clusterExport	Exports the specified objects to each cluster node.
clusterMap	Applies a function to multiple list or vector arguments on each node. Similar to mapply
clusterSplit	Splits the specified sequence into a continuous piece for each cluster node.
parApply	A parallel version of the apply function.
parCapply	A parallel column version of apply.
parLapply	A parallel version of lapply.
parRapply	A parallel row version of apply.
parSapply	A parallel version of sapply.
parLapplyLB	A parallel version of lapply, with load balancing.
parSapplyLB	A parallel version of sapply, with load balancing.

Miscellaneous parallel functions

clusterSetRNGStream	Sets the random number generator for each node in the cluster to "L'Ecuyer-CMRG".
detectCores	Returns an integer value indicating the number of CPU cores, or NA if retrieving processor information is not supported on the current system.
nextRNGStream and nextRNGSubStream	Takes a seed for the "L'Ecuyer-CMRG" random number generator and produces a new seed of the same kind.
setDefaultCluster	Registers a cluster as the default for the current session.
splitIndices	Splits the sequence of integers from 1 to nx into contiguous pieces for each of ncl cluster nodes.
stopCluster	Stops the engine nodes in the cluster cl.

Dummy parallel functions

The following functions are implemented as dummy functions to provide compatibility for the open-source R parallel functions. (These functions just run serial evaluation, not parallel evaluation.)

pvec	To support existing code, it just applies FUN to v and the other ... arguments.
mc.reset.stream	Does not reset the random number generators.
mcaffinity	This function does nothing.
mccollect	This function does nothing.
mclapply	This function just calls the non-parallel lapply.
mcparallel	Immediately evaluates and saves the value of expr.

Placeholder functions for R compatibility

The following functions are defined to be compatible with open-source R, but Spotfire Enterprise Runtime for R does not support these types of clusters. If they are called, they generate an error.

makeForkCluster
makePSOCKcluster

Description