Heartbeats and Failure Detection
Lightweight network communications sent at regular intervals, called heartbeats, are sent between GridServer components, such as from Drivers to Brokers, from Engine Instances to Brokers, and from Engine Daemons to Directors. A Manager detects Driver and Engine failure when it does not receive a heartbeat within the configurable heartbeat interval time. Drivers detect Broker failure by failing to connect when they submit tasks or poll for results. Engines detect Broker failure when they attempt to report for work or return results. To minimize unnecessary messaging, a heartbeat is sent only if no other message has been sent within the heartbeat interval.
Heartbeat period for clients can be configured in the GridServer Administration Tool at Admin > System Admin > Manager Configuration > Communication.