Grid Fault-Tolerance and Failover
GridServer is a fault-tolerant and resilient distributed computing platform. The GridServer platform recovers from a component failure, guaranteeing the execution of Services over a distributed computing grid with diverse, intermittent compute resources. This section describes what GridServer does in the event of Engine, Driver, and Manager failure. Failures of components within the grid can happen for a number of reasons, such as power outage, network failure, or interruptions by end users. For the purposes of this discussion, failure means any event that causes grid components to be unable to communicate with each other.
Subtopics