Parallel Processing & Distributed Systems - Chapter 2: Parallel Computer Models & Classification

An abstract machine model is mainly used in
the design and analysis of parallel algorithms
without worry about the details of physics
machines.
Three abstract machine models:
–PRAM
–BSP
– Phase Parallel
23 trang | Chuyên mục: Xử Lý Song Song | Chia sẻ: dkS00TYs | Lượt xem: 1306 | Lượt tải: 1
Tóm tắt nội dung Parallel Processing & Distributed Systems - Chapter 2: Parallel Computer Models & Classification, để xem tài liệu hoàn chỉnh bạn click vào nút "TẢI VỀ" ở trên
ThoaiNam
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
 Abstract Machine Models:
– PRAM, BSP, Phase Parallel
 Pipeline, Processor Array, Multiprocessor, Data 
Flow Computer
 Flynn Classification:
– SISD, SIMD, MISD, MIMD
 Pipeline Computer
ThoaiNam
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
 An abstract machine model is mainly used in 
the design and analysis of parallel algorithms 
without worry about the details of physics 
machines.
 Three abstract machine models:
– PRAM
– BSP
– Phase Parallel
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
 RAM (random access machine)
…
Memory
Program
Location
counter
r0
r1
r2
r3
…x2x1
xn…x2x1
Write-only
output tape
Read-only
input tape
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
Global memory
Private memory Private memory Private memory
 Parallel random-access machine
…
P1
…
P2
…
Pn
…
Interconnection network
…
Control
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
 A control unit
 An unbounded set of processors, each with its own private memory and 
an unique index
 Input stored in global memory or a single active processing element
 Step: (1) read a value from a single private/global memory location
(2) perform a RAM operation
(3) write into a single private/global memory location
 During a computation step: a processor may activate another processor
 All active, enable processors must execute the same instruction (albeit 
on different memory location)
 Computation terminates when the last processor halts
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
 Definition:
The cost of a PRAM computation is the product of the 
parallel time complexity and the number of processors used.
Ex: a PRAM algorithm that has time complexity O(log p) using 
p processors has cost O(p log p)
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
 Time complexity of a PRAM algorithm is often 
expressed in the big-O notation
 Machine size n is usually small in existing parallel 
computers
 Ex: 
– Three PRAM algorithms A, B and C have time complexities 
if 7n, (n log n)/4, n log log n.
– Big-O notation: A(O(n)) < C(O(n log log n)) < B(O(n log n))
– Machines with no more than 1024 processors: 
log n ≤ log 1024 = 10 and log log n ≤ log log 1024 < 4
and thus: B < C < A
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
 PRAM execution can result in simultaneous access to the 
same location in shared memory.
– Exclusive Read (ER)
» No two processors can simultaneously read the same memory 
location.
– Exclusive Write (EW)
» No two processors can simultaneously write to the same memory 
location.
– Concurrent Read (CR)
» Processors can simultaneously read the same memory location.
– Concurrent Write (CW)
» Processors can simultaneously write to the same memory 
location, using some conflict resolution scheme.
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
 Common/Identical CRCW
– All processors writing to the same memory location must be writing 
the same value.
– The software must ensure that different values are not attempted to 
be written.
 Arbitrary CRCW
– Different values may be written to the same memory location, and an 
arbitrary one succeeds.
 Priority CRCW
– An index is associated with the processors and when more than one 
processor write occurs, the lowest-numbered processor succeeds.
– The hardware must resolve any conflicts
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
 Begin with a single active 
processor active
 Two phases:
– A sufficient number of processors 
are activated
– These activated processors 
perform the computation in parallel
 log p activation steps: p
processors to become active
 The number of active 
processors can be double by 
executing a single instruction 
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
3650192834
9510107
91517
932
41
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
(EREW PRAM Algorithm in Figure2-7, page 32, book [1])
Ex: SUM(EREW)
Initial condition: List of n ≥ 1 elements stored in A[0..(n-1)]
Final condition: Sum of elements stored in A[0]
Global variables: n, A[0..(n-1)], j
begin
spawn (P0, P1,…, Pn/2  -1)
for all Pi where 0 ≤ i ≤ n/2  -1 do
for j← 0 to log n  – 1 do
if i modulo 2j = 0 and 2i+2j < n the
A[2i] ← A[2i] + A[2i+2j]
endif
endfor
endfor
end
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
BSP – Bulk Synchronous Parallel
 BSP Model
– Proposed by Leslie Valiant of Harvard University
– Developed by W.F.McColl of Oxford University
Communication Network (g)
P M P M P M
Node (w) Node Node
Barrier (l)
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
 A set of n nodes (processor/memory pairs)
 Communication Network
– Point-to-point, message passing (or shared variable)
 Barrier synchronizing facility
– All or subset
 Distributed memory architecture
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
 A BSP program:
– n processes, each residing on a node
– Executing a strict sequence of supersteps
– In each superstep, a process executes:
»Computation operations: w cycles
»Communication: gh cycles
» Barrier synchronization: l cycles
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
 The basic time unit is a cycle (or time step)
 w parameter
– Maximum computation time within each superstep
– Computation operation takes at most w cycles.
 g parameter
– Number of cycles for communication of unit message when all 
processors are involved in communication - network bandwidth
– (total number of local operations performed by all processors in
one second) / (total number of words delivered by the 
communication network in one second)
– h relation coefficient
– Communication operation takes gh cycles.
 l parameter
– Barrier synchronization takes l cycles.
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
Superstep 1
Superstep 2
Barrier
P1 P2 P3 P4
Computation
Communication
Barrier
Computation
Communication
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
Time Complexity of BSP
Algorithms
 Execution time of a superstep:
– Sequence of the computation, the communication, and 
the synchronization operations: w + gh + l
– Overlapping the computation, the communication, and 
the synchronization operations: max{w, gh, l}
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
 Proposed by Kai Hwang & Zhiwei Xu
 Similar to the BSP:
– A parallel program: sequence of phases
– Next phase cannot begin until all operations in the current phase 
have finished
– Three types of phases:
» Parallelism phase: the overhead work involved in process 
management, such as process creation and grouping for parallel 
processing
» Computation phase: local computation (data are available)
» Interaction phase: communication, synchronization or 
aggregation (e.g., reduction and scan)
 Different computation phases may execute different 
workloads at different speed.
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
 A parallel machine model (also known as programming 
model, type architecture, conceptual model, or idealized 
model) is an abstract parallel computer from programmer‘s 
viewpoint, analogous to the von Neumann model for 
sequential computing.
 The abstraction need not imply any structural information, 
such as the number of processors and interprocessor 
communication structure, but it should capture implicitly the 
relative costs of parallel computation.
 Every parallel computer has a native model that closely 
reflects ist own architecture.
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
 Five semantic attributes
– Honogeneity
– Synchrony
– Interaction mechanism
– Address space
– Memory model
 Several performance attributes
– Machine size
– Clock rate
– Workload
– Speedup, efficiency, utilization
– Startup time
…
File đính kèm:
Parallel Processing & Distributed Systems - Chapter 2 Parallel Computer Models & Classification.pdf