Reassigning Tasks in Exceptional Situations

Under normal operating conditions, distributed queue software arranges for exactly one worker to process each task. This section describes three exceptional conditions that require different semantics:

A worker exits or loses network communication before completing an assigned task. The scheduler reassigns the task to another worker.
A worker processes tasks more slowly than expected. The scheduler uses its complete time parameter to determine whether to reassign the task to another worker. Duplicate processing can occur.
Scheduler replacement—the scheduler exits or loses network communication, so another member replaces it as the active scheduler. The new scheduler reassigns incomplete tasks, guided by its complete time parameter. Duplicate processing can occur.

 

Two factors can affect behavior in exceptional situations:

When the sender and scheduler have a certified delivery agreement, behavior differs from when they do not.
Behavior differs depending on the scheduler’s complete time parameter.

Worker Exit

When a worker exits or loses network communication, the scheduler detects its absence and reassigns all of that worker’s incomplete tasks to other workers.

This behavior applies when the task source (sender) and the scheduler have a certified delivery agreement. This behavior applies for any setting of the scheduler’s complete time parameter.

Slow Worker

When a worker processes tasks more slowly than expected, the scheduler detects slow operation using timers controlled by the scheduler’s complete time parameter.

Complete Time = 0

Scheduler reassigns a task when the assigned worker does not accept it. Once a worker accepts, the scheduler waits indefinitely for completion.

Potential non-completion of tasks.

Complete Time > 0

Scheduler reassigns a task when the assigned worker does not accept it, or does not complete it in time.

Potential duplication of tasks.

This behavior applies only when the sender and scheduler have a certified delivery agreement. When no certified delivery agreement is in effect, the scheduler does not reassign tasks based on delayed completion.

The main concern in this situation is to ensure that all certified tasks complete in a timely fashion. When a slow worker hinders this goal, then the scheduler reassigns its task—selecting speed over unduplicated processing. However, if the complete time parameter is zero, then the scheduler selects unduplicated processing over speed (but see also, Scheduler Replacement).

Scheduler Replacement

When the scheduler exits or loses network communication, another member replaces it as the active scheduler.

Complete Time = 0

The new scheduler immediately reassigns all incomplete tasks.

Complete Time > 0

The new scheduler immediately reassigns all unaccepted tasks. It also sets a timer to elapse after the complete time; when the timer expires, the new scheduler reassigns all incomplete tasks.

This case presents the lowest probability of task duplication, but duplication is still possible for slow workers.

This behavior applies only when the sender and scheduler have a certified delivery agreement. When no certified delivery agreement is in effect, the new scheduler does not reassign tasks.

The main concern in this situation is to ensure that all certified tasks complete at least once (that is, no certified task remains unprocessed). The new scheduler reassigns all tasks that are at risk (for example, the assigned worker might have exited during scheduler replacement). In achieving this goal, the probability of duplicate processing is high. However, the lowest probability of duplicates is the case in which the complete time parameter is non-zero; in this case, the scheduler uses the extra information to reduce duplication.