Data Movement Mechanisms
The GridServer software uses the following data movement mechanisms:
| • | Service Request Argument and Return Value |
| • | Service Session State |
| • | Shared Directories and Direct Data Transfer |
| • | Resource Update |
| • | GridCache |
| • | Data References |
Service Request Argument and Return Value
The most direct way to transmit data between a Grid client and an Engine is through:
| • | The argument to a Service request and |
| • | The return value from the Service request. |
If you enable Direct Data Transfer, the data travels directly between Driver and Engine.
Each request is handled efficiently, but the aggregate data transfer across hundreds of requests adds up significantly. Therefore, factor data common to all requests into session state or init data, or distribute it by another mechanism.
Service Session State
Any Service Session can have an associated state. As described in the Services sections, this state resides on the Driver as well as on each Engine hosting the instance, so it is fault-tolerant with respect to Engine failure.
Service Session state is ideal for data that is specific to a session. Service Session state is easy to work with because it fits the standard object-oriented programming model; it is downloaded once per Engine.
Transmission of the Service Session state from Driver to Engine is peer to peer and is GridServer’s Direct Data Transfer (DDT) feature. DDT is enabled by default. When DDT is enabled and a Service creation or Service request is initiated on a Driver, the initialization data or request argument resides on the Driver, sending only a URL (and not data) to the Manager. When an Engine receives the request, it downloads the data directly from the Driver rather than the Manager. This mechanism saves one network trip for the data and can result in significant performance improvements when the data is much larger than the URL that points to it, as is usually the case. It also greatly reduces the load on the Manager, improving Manager throughput and robustness.
Shared Directories and DDT
Some network configurations are more efficient using a shared directory for DDT rather than the internal file servers included in the Drivers and Engines. In this case, configure the Driver and Engines to read and write requests and results to the same shared network directory, rather than to transfer data over HTTP. All Engines and the Driver must have read and write permissions on this directory. Configure shared directories at the Service level with the SHARED_UNIX_DIR and SHARED_WIN_DIR options. If you use both Windows and UNIX Engines and Drivers, configure both options to be directories that resolve to the same directory location for the respective operating systems.
Resource Update
GridServer’s Resource Update mechanism replicates Grid Libraries, or archives of versioned sets of resources, with Engines. It also replicates the contents of a directory on the Manager to a corresponding directory on each Engine. When you use Resource Update, use the Services > Services > Grid Libraries page in the GridServer Administration Tool to upload files to the Manager. After all currently running Services finish, the Engines download the new files. For more on Resource Update, see the TIBCO GridServer® Administration.
Resource Update is the best way to guarantee that the same file is on the disk of every Engine in your Grid. File Update is ideal for distributing application code, but it is also a good way to deliver configuration files or static data to Engines before your computation starts. Any kind of data that changes infrequently, like historical data, is a good candidate for distribution in this fashion.
GridCache
GridServer’s GridCache feature is a repository on the Manager that is aggressively cached by components (Drivers and Engines). The repository comprises a set of regions, each of which is a map from string keys to arbitrary values. The GridCache API supports reads, writes, removing key-value pairs, and getting a list of all keys in a catalog. For more information about GridCache, see GridCache.
A GridCache component caches every value that it gets or puts. If a component changes a key’s value or removes it, the Manager asks all components to invalidate their cached copy of that key’s value.
GridCache is fault-tolerant with respect to Engine failure because the data is stored on the Manager. When an Engine fails, its cached data is lost and its task is rescheduled. The Engine that picks up the rescheduled task gradually builds up its cache as it gets data from the Manager.
GridCache is a flexible and efficient way for Engines and Drivers to share data. Like File Update, an Engine needs only a single download to obtain a piece of constant data. Unlike File Update, GridCache supports data that changes over the life of a computation.
You can use GridCache for having Engines post results. This is generally only useful if those results are to be used as inputs to subsequent computations.
Data References
GridServer Data References are objects that represent data existing on a GridServer client. You can use them to pass lightweight data from one client to another so that only the destination needing the data performs the data transfer. Typically, one client filesystem stores the data and another client’s file server serves the data.