The Scheduler
The scheduler is the component that is used on a GridServer Broker to assign tasks to Engines. It attempts to make optimal matches based on criteria such as the session priority level or SLA group, affinity, and Serial Service and Priority execution modes.
Scheduler Overview
The scheduler aims to schedule tasks to Engines by attempting to have the proper amount of Engines allocated to all active Service Sessions at any given time. A Scheduling Event is any event that might result in a task being assigned to an Engine, such as an Engine finishing a task, an Engine logging in, or new tasks added. On any given scheduling event, the scheduler decides the number of Engines each Session must have at the time, based on static and dynamic criteria, and then assigns the appropriate number of Engines to sessions based on how many the Session needs to reach the ideal level.
There are two modes under which the Scheduler can operate: Priority, or Service Level Agreement (SLA).
Priority Mode
Every GridServer Service has an associated priority. By default, there are ten priority levels, ascending in priority 0-9. When set to 0, the Service is suspended, meaning that no tasks are assigned. Priority can also be set to Urgent, which is covered later in this section. The priority is set when a Service is created. It can be changed at run time in the Administration tool and by the Admin API.
A Priority Weight is associated with each Priority Level. The weight defines the amount of Engines allocated to a session relative to all other active sessions. For example, if Session A and B have weights of 2.0, and Session C has weight 4.0, and there are eight Engines, Session A and B get allocated two Engines each, and Session C gets four. To set the weights, go to Admin > System Admin > Manager Configuration > Services and change the Priority Weights property. By default the weights are linear.
The number of priority levels can be changed. However, a large number of priority levels can impact performance when Serial Priority is not enabled, so it is recommended that the Serial Priority is enabled in this case.
There are two algorithms that are used in Priority mode, Usage, which allocates Engines to all running Services as fairly as possible, and Time, which simply allocates Engines to Sessions in the order in which they were created. Also, when Serial Priority Mode is enabled, Sessions of a higher priority are assigned Engines when needed before lower priority Sessions. By default, Usage is used with Serial Priority Mode disabled.
Usage Algorithm
The scheduler takes into account the amount of usage that the Session has received over a given historical window of time. The “usage” refers to the amount of Engine clock time that the Session has occupied during that window. When a Session is created, it is initialized in such a way that it simulates as if it was running ideally over this window.
This usage provides the ordering in which Engines are allocated to Sessions. This addresses starvation issues, round-off error (the number of ideal Engines is rarely an integer), and under/over-utilization due to discrimination, changes in the number of available Engines, and so on.
On a scheduling event, Sessions are assigned the ideal number of Engines less the amount that are currently allocated, in the order of least to most usage.
This approach can be seen as analogous to a CPU thread scheduling algorithm. Each Session is a “thread”, the Engines are the “CPU”, the window is the sample period, and each task is an uninterruptible unit of CPU time allotted to a thread.
Whenever an Engine or set of Engines is available for scheduling, the scheduler decides how many Engines each session must be allocated. In general, that value is:
Ideal Engines per Session = All Engines * Session Priority Weight / Total Weight,
where “Total Weight” is the sum of all Priority Weights of active sessions. This value is rounded up to the next integer to prevent starvation for an ideal calculation of < 0.5, and assures that the sum of Ideal Engine’s is always at least as large as Total Engines. This algorithm also takes into account if the actual number of Engines that can be allocated is less than the ideal, such as when a Session is towards the end, or when Max Engines is used.
Recall that a Session’s usage is considered to be the total Engine clock time spent on the session over the last configurable amount of time. This includes running and completed tasks. When a Session is created, it must initialize its usage. The simplest, most fair method of doing this is to assume it has been operating in a steady state over the window with the ideal non-rounded number of Engines. The variables that monitor usage are then initialized as such. If no sessions are active, it initializes them such that the session’s ideal is the total number of Engines currently on the Broker.
Whenever there is an event that requires a scheduling episode, the scheduler assigns the proper number of Engines to each session for it to be at its ideal amount. This assignment is performed in order of least to most priority-normalized usage. If there are any unassigned Engines remaining after this initial round based on usage (typically due to disallowed conditions preventing assignment), a second tier round robin assignment is performed.
Time Algorithm
The Time algorithm is used by setting Serial Service Execution to true. This algorithm works as follows:
When a Session is created on the Broker, it is placed in the queue. On each scheduling episode, the scheduler simply iterates through the queue and assigns all idle Engines it can to each Session. Normally, only the first Session is assigned Engines, except when that Session is finishing up, or if Discriminators prevent Engines from running on that Session. A Session keeps its place in queue until it is destroyed regardless of whether or not it has tasks in queue or running.
Serial Priority Mode
When Serial Priority Mode is enabled, the scheduler ensures that Sessions are assigned Engines in order of priority. The scheduler iterates through each priority level in descending order, and assigns as many Engines to Sessions at that level as possible. Either the Time Algorithm or the Usage Algorithm, depending on whether Serial Service Execution is enabled, is used on the subset of Sessions at the same Priority level. Note that this means that Priority takes precedence over creation time.
Intrinsic Affinity
The scheduler uses the fact that an Engine has initialization data and updates from a particular Service to prioritize routing of subsequent requests to that Service. This feature, called affinity, reduces data movement, because unneeded Engines are not recruited into the Service. For example, Engine A has worked on Session X and Engine B has not. If both are idle and a task is submitted by X, Engine A is assigned the task. However, if Engine A is busy, Engine B is assigned the task. You can also use the AFFINITY_WAIT Service option to control how long a queued request avoids allocation to an available Engine that has no affinity, in the hope of later being matched to an Engine with affinity.
Affinity is not used when using the Time algorithm.
For more information about tuning or customizing how the scheduler uses affinity, see Optimizing the Grid.
Priority Aggregation
Priority Aggregation is a setting that can be enabled for the usage algorithm. When enabled, the amount of Engines to be allocated is now aggregated over the entire group of Sessions running at a priority level, rather than per Session. That is,
Ideal Engines per Session = All Engines * Session Priority Weight / Total Weight / Sessions at Priority
This mode is used when you want to guarantee a known distribution of Engines amongst priority levels regardless of how many Sessions are running at that level.
Example:
With 100 Engines total, 1 Session at level 6 gets 60, and 1 Session at level 4 gets 40.
Without priority aggregation, if another level 4 Session is added, each level 4 Session now wants 29, and the level 6 wants 43. With it enabled, the level 6 Session still gets 60, and each level 4 Session gets 20.
Urgent Priority Services and Preemption
A Session’s priority can be set to Urgent when that Session must be serviced immediately, even preempting running tasks if necessary. An urgent Service’s weight is hard-coded to be essentially infinite, so that they are assigned all available Engines.
When an Engine is preempted, the task it is currently running is canceled and rescheduled, and the Engine becomes available for new tasks. Engines are preempted on a Service under the following conditions: if after being assigned all free Engines a Service can still make use of more Engines, then it might preempt some busy Engines, subject to two constraints that can be adjusted with configuration properties. First, the urgent Service must have been in the queue for Preempt Delay Seconds. Second, the percentage of Engines in the grid running urgent Services cannot exceed Preemptable Engine Percent.
For example, if this property is set to 50, and 47 percent of the Engines are currently running urgent Services, then at most three percent are preempted. This value is not a hard limit on the number of Engines that might be running urgent Services, because free Engines are allocated to urgent Services regardless of how many Engines are already running urgent Services.
The scheduler chooses Engines for preemption based on the following rules: Engines running an urgent Service are never preempted. An Engine running a task from a Service with lower priority is generally selected in preference to one running a higher-priority task. However, if the lower-priority task has been running for a long time, a short-running, higher-priority task might be preempted instead. The Preempt Threshold Minutes property determines the value at which this crossover happens. For example, if this property is set to 30, then an Engine that has just started running a priority 2 task is chosen for preemption over an Engine that has been running a priority 1 task for more than 30 minutes. The formula is as follows: priority + (runningMillis / preemptThresholdMillis).
|
Warning |
Preemption can have a significant performance impact on your grid and cause scheduling problems with other Services. It must be used with caution. |
Other important points concerning priority Services and preemption:
| • | Tasks canceled by preemption are not subject to a rescheduling limit, since they are not considered failures. |
| • | To prevent preemption from ever occurring, set Preemptable Engine Percent to 0. |
| • | The first Service on the queue might not get all free Engines if it doesn’t have enough tasks, it is already using its maximum number of Engines, or it discriminates against some Engines. Free Engines that are not taken by the first urgent Service are first offered to the other urgent Services on the queue, and then to all other Services. |
SLA Mode
When the SLA mode is used, the scheduler guarantees that a number or percentage of Engines on the Broker is allocated to a group of Services, provided enough Engines are available.
To enable SLA scheduling, go to Admin > System Admin > Manager Configuration > Services and set SLA Scheduler Enabled to true. Then you must define the Broker’s SLA Groups, which is a comma-delimited list of groups and values. The values must be either all integers, or all floating points where 0 < x <1. When integers, it indicates that the SLA is the actual number of Engines, otherwise it is the percentage of Engines currently logged in to the Broker.
For example, setting SLA Groups to a=10,b=12,c=20 specifies that group A’s target is 10 Engines, group B’s target is 12 Engines, and group C has a 20-Engine target. Alternatively, an SLA Groups setting of a=.25,b=.5,c=.1 sets the A, B, and C’s targets to 25%, 50%, and 10% of Engines, respectively.
Sessions are assigned to groups by setting Description.SLA_GROUP_NAME on the Description. All Sessions must have this set; if not set or set to an invalid group name, the Session is rejected by the Broker.
On every scheduling event, the scheduler calculates how many Engines must be allocated to each SLA Group, and then assign Engines to Sessions as follows:
| • | The entire list of Sessions is ordered by the same Usage algorithm used in the Priority mode. This ensures fairness in the steady-state. Scheduling is performed round-robin within an episode, starting at the Session with the least amount of usage. |
| • | When a Broker has enough Engines to meet all group SLA Engine needs, Engines are first assigned so that all SLA Groups have their SLA number met. Then any remaining Engines are divided up among the groups, weighted by their SLA numbers, rounded to the next integer. |
| • | When a Broker does not have enough Engines to meet group SLA Engine needs, the Engines are allocated to groups by weight as in the case of the remainder when there are enough Engines. |
For example, you have set SLA Groups as a=20, b=30, c=50, and there are nine remaining Engines after SLA Engine needs are met. 9 * 20 / 100 = 1.8, so group A gets 2 remainders. Likewise, group B gets 3, and group C gets 5. Note that in this case, one of the groups gets one less than their number due to rounding, but the usage algorithm corrects these inequities in the long run.
The SLA scheduler does not take Affinity into account, nor is there any analog to Urgent Priority.
SLA Task Preemption
In the case of long-running tasks, you can use the task preemption option with the SLA scheduler to better distribute Services to idle Engines. When the SLA Preemption Enabled property is set to true, when one SLA group is idle, other groups might be allocated its Engines. If Services are then added to the idle group, tasks are preempted to reallocate Engines needed to meet its SLA.