Parallel Processing & Distributed Systems - Chapter 7: Parallel Job Schedulings

Schedule:
allocation of tasks to processors
Dynamic scheduling
– A single queue of ready processes
– A physical processor accesses the queue to run the next
process
– The binding of processes to processors is not tight
Static scheduling
– Only one process per processor
– Speedup can be predicted
27 trang | Chuyên mục: Xử Lý Song Song | Chia sẻ: dkS00TYs | Lượt xem: 1746 | Lượt tải: 2
Tóm tắt nội dung Parallel Processing & Distributed Systems - Chapter 7: Parallel Job Schedulings, để xem tài liệu hoàn chỉnh bạn click vào nút "TẢI VỀ" ở trên
must co-exist in the same system
 Administrative scheduling
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
 A parallel program is a 
collection of tasks, some 
of which must be 
completed before others 
begin
 Deterministic model:
The execution time needed 
by each task and the 
precedence relations 
between tasks are fixed 
and known before run time
 Task graph
T1--------
2
T2-------
3
T3--------
1
T4--------
2
T5--------
3
T6--------
3
T7--------
1
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
 Gantt chart indicates the time each task 
spends in execution, as well as the 
processor on which it executes
T7T5T2T1
T6T3
T4
1 2 3 4 5 6 7 8 9
T1--------
2
T2-------
3
T3--------
1
T4--------
2
T5--------
3
T6--------
3
T7--------
1
Time
P
r
o
c
e
s
s
o
r
s
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
 If all of the tasks take unit time, and the task graph is a 
forest (i.e., no task has more than one predecessor), then a 
polynomial time algorithm exists to find an optimal schedule
 If all of the tasks take unit time, and the number of 
processors is two, then a polynomial time algorithm exists to 
find an optimal schedule
 If the task lengths vary at all, or if there are more than two 
processors, then the problem of finding an optimal schedule 
is NP-hard.
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
Graham’s list scheduling algorithm
 T = {T1, T2,…, Tn} 
a set of tasks
 µ: T→ (0,∞) 
a function associates an execution time with each task
 A partial order < on T
 L is a list of task on T
 Whenever a processor has no work to do, it instantaneously 
removes from L the first ready task; that is, an unscheduled 
task whose predecessors under < have all completed 
execution. (The processor with the lower index is prior)
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
Graham’s list scheduling algorithm
- Example
T7T5T2T1
T6T3
T4
T1--------
2
T2-------
3
T3--------
1
T4--------
2
T5--------
3
T6--------
3
T7--------
1
Time
P
r
o
c
e
s
s
o
r
s
L = {T1, T2, T3, T4, T5, T6, T7}
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
Graham’s list scheduling algorithm
- Problem
T8T6T3
T7T5T4T2
T9T1
T8T1
T7T4
T6T3
T9T5T2
T1--------
3
T9--------
9
T2--------
2
T3--------
2
T4--------
2
T5--------
4
T6--------
4
T7--------
4
T8--------
4 L = {T1, T2, T3, T4, T5, T6, T7, T8, T9}
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
Coffman-Graham’s scheduling
algorithm (1)
 Graham’s list scheduling algorithm depends upon a 
prioritized list of tasks to execute
 Coffman and Graham (1972) construct a list of tasks for the 
simple case when all tasks take the same amount of time.
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
Coffman-Graham’s scheduling
algorithm (2)
 Let T = T1, T2,…, Tn be a set of n unit-time tasks to be 
executed on p processors
 If Ti < Tj, then task is Ti an immediate predecessor of task Tj, 
and Tj is an immediate successor of task Ti
 Let S(Ti) denote the set of immediate successor of task Ti
 Let α(Ti) be an integer label assigned to Ti.
 N(T) denotes the decreasing sequence of integers formed 
by ordering of the set {α(T’)| T’ ∈ S(T)}
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
Coffman-Graham’s scheduling
algorithm (3)
1. Choose an arbitrary task Tk from T such that S(Tk) = 0, and define α(Tk) 
to be 1
2. for i ← 2 to n do
a. R be the set of unlabeled tasks with no unlabeled successors
b. Let T* be the task in R such that N(T*) is lexicographically smaller 
than N(T) for all T in R
c. Let α(T*) ← i
endfor
3. Construct a list of tasks L = {Un, Un-1,…, U2, U1} such that α(Ui) = i for all i 
where 1 ≤ i ≤ n
4. Given (T, <, L), use Graham’s list scheduling algorithm to schedule the 
tasks in T
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
Coffman-Graham’s scheduling
algorithm – Example (1)
T1
T3
T4
T5
T7
T8
T9
T2
T6
T5
T8T3T1
T9T7T4T6T2
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
Coffman-Graham’s scheduling
algorithm – Example (2)
Step1 of algorithm
task T9 is the only task with no immediate successor. Assign 1 to α(T9)
Step2 of algorithm
 i=2: R = {T7, T8}, N(T7)= {1} and N(T8)= {1} ⇒ Arbitrarily choose task T7
and assign 2 to α(T7)
 i=3: R = {T3, T4, T5, T8}, N(T3)= {2}, N(T4)= {2}, N(T5)= {2} and N(T8)= {1} ⇒
Choose task T8 and assign 3 to α(T8)
 i=4: R = {T3, T4, T5, T6}, N(T3)= {2}, N(T4)= {2}, N(T5)= {2} and N(T6)= {3} ⇒
Arbitrarily choose task T4 and assign 4 to α(T4)
 i=5: R = {T3, T5, T6}, N(T3)= {2}, N(T5)= {2} and N(T6)= {3} ⇒ Arbitrarily 
choose task T5 and assign 5 to α(T5)
 i=6: R = {T3, T6}, N(T3)= {2} and N(T6)= {3} ⇒ Choose task T3 and assign 6 
to α(T3)
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
Coffman-Graham’s scheduling
algorithm – Example (3)
 i=7: R = {T1, T6}, N(T1)= {6, 5, 4} and N(T6)= {3} ⇒ Choose task T6 and 
assign 7 to α(T6)
 i=8: R = {T1, T2}, N(T1)= {6, 5, 4} and N(T2)= {7} ⇒ Choose task T1 and 
assign 8 to α(T1)
 i=9: R = {T2}, N(T2)= {7} ⇒ Choose task T2 and assign 9 to α(T2)
Step 3 of algorithm
L = {T2, T1, T6, T3, T5, T4, T8, T7, T9}
Step 4 of algorithm
Schedule is the result of applying Graham’s list-scheduling algorithm to 
task graph T and list L
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
 Preemption inside spinlock-controlled critical sections
 Cache corruption
 Context switching overhead
Enter
→Critical Section
Exit
P0
→ Enter
Critical Section
Exit
P1
→ Enter
→Critical Section
Exit
P2
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
 Global queue
 Variable partitioning
 Dynamic partitioning with two-level scheduling
 Gang scheduling
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
 A copy of uni-processor system on each node, while sharing 
the main data structures, specifically the run queue
 Used in small-scale bus-based UMA shared memory 
machines such as Sequent multiprocessors, SGI 
multiprocessor workstations and Mach OS
 Autonamic load sharing
 Cache corruption
 Preemption inside spinlock-controlled critical sections
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
 Processors are partitioned into disjoined sets and each job is 
run only in a distinct partition
 Distributed memory machines: Intel and nCube hypercudes, 
IBM PS2, Intel Paragon, Cray T3D
 Problem: fragmentation, big jobs
yesyesyesDynamic
noyesyesAdaptive
nonoyesVariable
nononoFixed
ChangesSystem loadUser request
Parameters taken into account
Scheme
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
Dynamic partitioning with 
two-level scheduling
 Changes in allocation during execution
 Workpile model: 
– The work = an unordered pile of tasks or chores
– The computation = a set of worker threads, one per processor, that 
take one chore at time from the work pile
– Allowing for the adjustment to different numbers of processors by 
changing the number of the wokers
– Two-level scheduling scheme: the OS deals with the allocation of 
processors to jobs, while applications handle the scheduling of chores 
on those processors
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
 Problem: Interactive response times ⇒ time slicing
– Global queue: uncoordinated manner
 Observartion:
– Coordinated scheduling in only needed if the job’s threads interact 
frequently
– The rare of interaction can be used to drive the grouping of threads 
into gangs
 Samples: 
– Co-scheduling
– Family scheduling: which allows more threads than processors and
uses a second level of internal time slicing
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
Several specific 
scheduling methods
 Co-scheduling
 Smart scheduling [Zahorijan et al.]
 Scheduling in the NYU Ultracomputer [Elter et al.]
 Affinity based scheduling
 Scheduling in the Mach OS
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
 Context switching between applications rather then between 
tasks of several applications.
 Solving the problem of “preemption inside spinlock-controlled 
critical sections”.
 Cache corruption???
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
 Advoiding:
(1) preempting a task when it is inside its critical section
(2) rescheduling tasks that were busy-waiting at the time of 
their preemption until the task that is executing the 
corresponding critical section releases it.
 The problem of “preemption inside spinlock-controlled critical 
sections” is solved.
 Cache corruption???.
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
Scheduling in 
the NYU Ultracomputer
 Tasks can be formed into groups
 Tasks in a group can be scheduled in any of the following 
ways:
– A task can be scheduled or preempted in the normal manner
– All the tasks in a group are scheduled or preempted simultaneously
– Tasks in a group are never preempted.
 In addition, a task can prevent its preemption irrespective of 
the scheduling policy (one of the above three) of its group.
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
 Policity: a tasks is scheduled on the processor where it last 
executed [Lazowska and Squillante]
 Alleviating the problem of cache corruption
 Problem: load imbalance
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
 Threads
 Processor sets: disjoin
 Processors in a processor set is assigned a subset of threads 
for execution.
– Priority scheduling: LQ, GQ(0),…,GQ(31)
– LQ and GQ(0-31) are empty: the processor executes an special idle
thread until a thread becomes ready.
– Preemption: if an equal or higher priority ready thread is present
0
1
31
P0
P1
Pn
Global 
queue
(GQ)
Local 
queue
(LQ)
File đính kèm:
Parallel Processing & Distributed Systems - Chapter 7 Parallel Job Schedulings.pdf