Reschedules and Retries
Before the discussion of scheduling behavior, we must first define the terms Retry and Reschedule within the context of scheduling tasks.
Retry
A Retry is when a task is re-queued due to a known failure of the task. Such failures can be due to an error condition in the implementation, an error due to inability to download data, or the failure of an Engine (the monitor has detected that the Engine is no longer connected but it has not logged off.) It is always the result of the Engine returning the task as failed to the Broker. When a task is retried, it is always placed at the front of that session’s queue. The scheduler manages a retry count for each task, so that a limit can be placed on the number of allowed retries.
Reschedule
A Reschedule is when a task is re-queued when it might or might not have failed. When a task is rescheduled, it is by default placed at the back of that session’s queue, unless the Reschedule First configuration option on the Broker is set to true. (Go to Admin > System Admin > Manager Configuration > Services to set it.) The scheduler also manages a reschedule count for each task. The following conditions result in a reschedule:
| • | Engine Logoff: When an Engine logs off gracefully while running a task (such as when UI or CPU idle conditions are met, or there is a forced rebalance), the task is rescheduled, but the reschedule count is not incremented, since there was no task error. |
| • | Redundant Rescheduler: If any of the Redundant Rescheduler strategies are in effect, tasks might be rescheduled to other Engines. By default, those tasks are allowed to continue to run on the current Engines, in case they finish before the rescheduled tasks. In this case, the reschedule count is increased. |
Timeout Behavior
When the INVOCATION_MAX_TIME option is set, it specifies that any invocation of a request might not exceed this value. If a task times out on an Engine, it can either retry or be rescheduled, depending on what makes more sense for your application. If retried, the current Engine’s invoke process ends, and the task is assigned to another Engine. If rescheduled, the current Engine task continues execution. In either case, the appropriate count is incremented.
The default behavior (retry) is set on the Broker. It can also be set for the Service Type on the Service Type Registry page, or programmatically when the Service Session is created.
The specific timing involved with a retry/reschedule depends mainly upon three properties: The Task Max Time, the scheduler interval and the Engine heartbeat.
The moment the task is picked up by the Engine, the start time is marked. The scheduler wakes up at least once in every scheduler interval seconds to check for any tasks in progress that exceed the max time. If Reschedule on Timeout is false (the default), the scheduler logs off Engines that have timed out, causing the tasks to retry immediately. The tasks are placed at the top of its session’s queue. Note that the Engine on which the task is running does not restart itself until the next message is sent, typically a heartbeat. If true, those tasks are redundantly rescheduled, and Engines that have timed out are allowed to continue; the task is complete as soon as any Engine completes it.
In general, the maximum time is the Task Max Time plus the Poll Period of Service Rescheduler. For instance, if Task Max Time = 50 sec and Poll Period= 60 sec, best = 50 secs and worst = 110 secs. However, it can take up to the Engine Heartbeat for the Engine on which a task was retried to log back in.