Parallel Processing & Distributed Systems - Chapter 7: Parallel Job Schedulings
Schedule:
allocation of tasks to processors
Dynamic scheduling
– A single queue of ready processes
– A physical processor accesses the queue to run the next
process
– The binding of processes to processors is not tight
Static scheduling
– Only one process per processor
– Speedup can be predicted
must co-exist in the same system Administrative scheduling Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM A parallel program is a collection of tasks, some of which must be completed before others begin Deterministic model: The execution time needed by each task and the precedence relations between tasks are fixed and known before run time Task graph T1-------- 2 T2------- 3 T3-------- 1 T4-------- 2 T5-------- 3 T6-------- 3 T7-------- 1 Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Gantt chart indicates the time each task spends in execution, as well as the processor on which it executes T7T5T2T1 T6T3 T4 1 2 3 4 5 6 7 8 9 T1-------- 2 T2------- 3 T3-------- 1 T4-------- 2 T5-------- 3 T6-------- 3 T7-------- 1 Time P r o c e s s o r s Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM If all of the tasks take unit time, and the task graph is a forest (i.e., no task has more than one predecessor), then a polynomial time algorithm exists to find an optimal schedule If all of the tasks take unit time, and the number of processors is two, then a polynomial time algorithm exists to find an optimal schedule If the task lengths vary at all, or if there are more than two processors, then the problem of finding an optimal schedule is NP-hard. Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Graham’s list scheduling algorithm T = {T1, T2,…, Tn} a set of tasks µ: T→ (0,∞) a function associates an execution time with each task A partial order < on T L is a list of task on T Whenever a processor has no work to do, it instantaneously removes from L the first ready task; that is, an unscheduled task whose predecessors under < have all completed execution. (The processor with the lower index is prior) Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Graham’s list scheduling algorithm - Example T7T5T2T1 T6T3 T4 T1-------- 2 T2------- 3 T3-------- 1 T4-------- 2 T5-------- 3 T6-------- 3 T7-------- 1 Time P r o c e s s o r s L = {T1, T2, T3, T4, T5, T6, T7} Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Graham’s list scheduling algorithm - Problem T8T6T3 T7T5T4T2 T9T1 T8T1 T7T4 T6T3 T9T5T2 T1-------- 3 T9-------- 9 T2-------- 2 T3-------- 2 T4-------- 2 T5-------- 4 T6-------- 4 T7-------- 4 T8-------- 4 L = {T1, T2, T3, T4, T5, T6, T7, T8, T9} Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Coffman-Graham’s scheduling algorithm (1) Graham’s list scheduling algorithm depends upon a prioritized list of tasks to execute Coffman and Graham (1972) construct a list of tasks for the simple case when all tasks take the same amount of time. Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Coffman-Graham’s scheduling algorithm (2) Let T = T1, T2,…, Tn be a set of n unit-time tasks to be executed on p processors If Ti < Tj, then task is Ti an immediate predecessor of task Tj, and Tj is an immediate successor of task Ti Let S(Ti) denote the set of immediate successor of task Ti Let α(Ti) be an integer label assigned to Ti. N(T) denotes the decreasing sequence of integers formed by ordering of the set {α(T’)| T’ ∈ S(T)} Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Coffman-Graham’s scheduling algorithm (3) 1. Choose an arbitrary task Tk from T such that S(Tk) = 0, and define α(Tk) to be 1 2. for i ← 2 to n do a. R be the set of unlabeled tasks with no unlabeled successors b. Let T* be the task in R such that N(T*) is lexicographically smaller than N(T) for all T in R c. Let α(T*) ← i endfor 3. Construct a list of tasks L = {Un, Un-1,…, U2, U1} such that α(Ui) = i for all i where 1 ≤ i ≤ n 4. Given (T, <, L), use Graham’s list scheduling algorithm to schedule the tasks in T Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Coffman-Graham’s scheduling algorithm – Example (1) T1 T3 T4 T5 T7 T8 T9 T2 T6 T5 T8T3T1 T9T7T4T6T2 Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Coffman-Graham’s scheduling algorithm – Example (2) Step1 of algorithm task T9 is the only task with no immediate successor. Assign 1 to α(T9) Step2 of algorithm i=2: R = {T7, T8}, N(T7)= {1} and N(T8)= {1} ⇒ Arbitrarily choose task T7 and assign 2 to α(T7) i=3: R = {T3, T4, T5, T8}, N(T3)= {2}, N(T4)= {2}, N(T5)= {2} and N(T8)= {1} ⇒ Choose task T8 and assign 3 to α(T8) i=4: R = {T3, T4, T5, T6}, N(T3)= {2}, N(T4)= {2}, N(T5)= {2} and N(T6)= {3} ⇒ Arbitrarily choose task T4 and assign 4 to α(T4) i=5: R = {T3, T5, T6}, N(T3)= {2}, N(T5)= {2} and N(T6)= {3} ⇒ Arbitrarily choose task T5 and assign 5 to α(T5) i=6: R = {T3, T6}, N(T3)= {2} and N(T6)= {3} ⇒ Choose task T3 and assign 6 to α(T3) Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Coffman-Graham’s scheduling algorithm – Example (3) i=7: R = {T1, T6}, N(T1)= {6, 5, 4} and N(T6)= {3} ⇒ Choose task T6 and assign 7 to α(T6) i=8: R = {T1, T2}, N(T1)= {6, 5, 4} and N(T2)= {7} ⇒ Choose task T1 and assign 8 to α(T1) i=9: R = {T2}, N(T2)= {7} ⇒ Choose task T2 and assign 9 to α(T2) Step 3 of algorithm L = {T2, T1, T6, T3, T5, T4, T8, T7, T9} Step 4 of algorithm Schedule is the result of applying Graham’s list-scheduling algorithm to task graph T and list L Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Preemption inside spinlock-controlled critical sections Cache corruption Context switching overhead Enter →Critical Section Exit P0 → Enter Critical Section Exit P1 → Enter →Critical Section Exit P2 Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Global queue Variable partitioning Dynamic partitioning with two-level scheduling Gang scheduling Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM A copy of uni-processor system on each node, while sharing the main data structures, specifically the run queue Used in small-scale bus-based UMA shared memory machines such as Sequent multiprocessors, SGI multiprocessor workstations and Mach OS Autonamic load sharing Cache corruption Preemption inside spinlock-controlled critical sections Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Processors are partitioned into disjoined sets and each job is run only in a distinct partition Distributed memory machines: Intel and nCube hypercudes, IBM PS2, Intel Paragon, Cray T3D Problem: fragmentation, big jobs yesyesyesDynamic noyesyesAdaptive nonoyesVariable nononoFixed ChangesSystem loadUser request Parameters taken into account Scheme Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Dynamic partitioning with two-level scheduling Changes in allocation during execution Workpile model: – The work = an unordered pile of tasks or chores – The computation = a set of worker threads, one per processor, that take one chore at time from the work pile – Allowing for the adjustment to different numbers of processors by changing the number of the wokers – Two-level scheduling scheme: the OS deals with the allocation of processors to jobs, while applications handle the scheduling of chores on those processors Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Problem: Interactive response times ⇒ time slicing – Global queue: uncoordinated manner Observartion: – Coordinated scheduling in only needed if the job’s threads interact frequently – The rare of interaction can be used to drive the grouping of threads into gangs Samples: – Co-scheduling – Family scheduling: which allows more threads than processors and uses a second level of internal time slicing Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Several specific scheduling methods Co-scheduling Smart scheduling [Zahorijan et al.] Scheduling in the NYU Ultracomputer [Elter et al.] Affinity based scheduling Scheduling in the Mach OS Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Context switching between applications rather then between tasks of several applications. Solving the problem of “preemption inside spinlock-controlled critical sections”. Cache corruption??? Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Advoiding: (1) preempting a task when it is inside its critical section (2) rescheduling tasks that were busy-waiting at the time of their preemption until the task that is executing the corresponding critical section releases it. The problem of “preemption inside spinlock-controlled critical sections” is solved. Cache corruption???. Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Scheduling in the NYU Ultracomputer Tasks can be formed into groups Tasks in a group can be scheduled in any of the following ways: – A task can be scheduled or preempted in the normal manner – All the tasks in a group are scheduled or preempted simultaneously – Tasks in a group are never preempted. In addition, a task can prevent its preemption irrespective of the scheduling policy (one of the above three) of its group. Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Policity: a tasks is scheduled on the processor where it last executed [Lazowska and Squillante] Alleviating the problem of cache corruption Problem: load imbalance Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Threads Processor sets: disjoin Processors in a processor set is assigned a subset of threads for execution. – Priority scheduling: LQ, GQ(0),…,GQ(31) – LQ and GQ(0-31) are empty: the processor executes an special idle thread until a thread becomes ready. – Preemption: if an equal or higher priority ready thread is present 0 1 31 P0 P1 Pn Global queue (GQ) Local queue (LQ)
File đính kèm:
- Parallel Processing & Distributed Systems - Chapter 7 Parallel Job Schedulings.pdf