Parallel Processing & Distributed Systems - Chapter 4: Speedup

The main objective is to produce the results as

soon as possible

– (ex) video compression, computer graphics, VLSI

routing, etc

Implications

– Upper-bound is

– Make Sequential bottleneck as small as possible

– Optimize the common case

Modified Amdahl’s law for fixed problem size

including the overhead

pdf19 trang | Chuyên mục: Xử Lý Song Song | Chia sẻ: dkS00TYs | Lượt xem: 1492 | Lượt tải: 1download
Tóm tắt nội dung Parallel Processing & Distributed Systems - Chapter 4: Speedup, để xem tài liệu hoàn chỉnh bạn click vào nút "TẢI VỀ" ở trên
ThoaiNam
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
 Speedup & Efficiency
 Amdahl’s Law
 Gustafson’s Law
 Sun & Ni’s Law
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
 Speedup:
S = Time(the most efficient sequential 
algorithm) / Time(parallel algorithm)
 Efficiency:
E = S / N with N is the number of processors
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
Amdahl’s Law – Fixed Problem 
Size (1)
 The main objective is to produce the results as 
soon as possible
– (ex) video compression, computer graphics, VLSI 
routing, etc
 Implications
– Upper-bound is 
– Make Sequential bottleneck as small as possible
– Optimize the common case
 Modified Amdahl’s law for fixed problem size
including the overhead
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
Amdahl’s Law – Fixed Problem 
Size (2)
ParallelSequentialSequential
P5 P6 P7 P8P4P0 P1 P2 P3 P9SequentialParallel
T(1)
T(N)
Ts Tp
Ts=αT(1)⇒ Tp= (1-α)T(1)
T(N) = αT(1)+ (1-α)T(1)/N
Number of 
processors
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
Amdahl’s Law – Fixed Problem 
Size (3)
∞→→
−
+
=
−
+
= Nas
NN
TT
TSpeedup
αα
α
α
α
1
)1(
1
)1()1()1(
)1(
)(
)1(
NTime
TimeSpeedup =
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
∞→
+
→
+
−
+
= Nas
T
TT
N
TT
TSpeedup
overhead
overhead )1(
1
)1()1()1(
)1(
α
α
α
The overhead includes parallelism 
and interaction overheads
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
Gustafson’s Law – Fixed Time (1)
 User wants more accurate results within a time limit
– Execution time is fixed as system scales
– (ex) FEM for structural analysis, FDM for fluid dynamics
 Properties of a work metric
– Easy to measure
– Architecture independent
– Easy to model with an analytical expression
– No additional experiment to measure the work
– The measure of work should scale linearly with sequential 
time complexity of the algorithm
 Time constrained seems to be most generally viable 
model!
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
Gustafson’s Law – Fixed Time (2)
Parallel
P5 P6 P7 P8P4P0 P1 P2 P3 P9Sequential
Sequential
P0Sequential
P9
.
.
.
W0Ws
α = Ws / W(N)
W(N) = αW(N) + (1-α)W(N)
⇒W(1) = αW(N) + (1-α)W(N)N
W(N)
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
Gustafson’s Law – Fixed Time
without overhead
N
W
NWW
kNW
kW
NT
TSpeedup )1(1(
).(
).1(
)(
)1(
αα
αα
−+=
)−+
===
Time = Work . k
W(N) = W
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
Gustafson’s Law – Fixed Time
with overhead
W
W
N
WW
NWW
kNW
kW
NT
TSpeedup
00 1
1(1(
).(
).1(
)(
)1(
+
)−+
=
+
)−+
===
αααα
W(N) = W + W0
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
Sun and Ni’s Law –
Fixed Memory (1)
 Scale the largest possible solution limited 
by the memory space. Or, fix memory 
usage per processor
 Speedup, 
– Time(1)/Time(N) for scaled up problem is not 
appropriate. 
– For simple profile, and G(N) is the increase of 
parallel workload as the memory capacity 
increases n times.
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
Sun and Ni’s Law –
Fixed Memory (2)
N
NG
NGSpeedupMC )()1(
)()1(
αα
αα
−+
−+
=
 W=αW+(1- α)W
 Let M be the memory capacity of a single 
node
 N nodes: 
– the increased memory nM
– The scaled work: W=αW+(1- α)G(N)W
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
 Definition:
A function is homomorphism if there exists a function 
such that for any real number c and variable x,
.
 Theorem: 
If W = for some homomorphism function , 
, then, with all data being shared by 
all available processors, the simplified memory-
bounced speedup is
Sun and Ni’s Law –
Fixed Memory (3)
N
NG
NG
W
N
NgW
WNgWS
N
N
N )()1(
)()1(
)(
)(
1
1*
αα
αα
−+
−+
=
+
+
=
g
)()()( xgcgcxg =
g
)(Mg g
)()()( xgcgcxg =
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
Proof:
Let the memory requirement of Wn be M, Wn = .
M is the memory requirement when 1 node is available.
With N nodes available, the memory capacity will 
increase to NM.
Using all of the available memory, for the scaled parallel 
portion : 
.
Sun and Ni’s Law –
Fixed Memory (4)
NN WNgMgNgNMgW )()()()(
*
===
)(Mg
*
NW
N
N
N
N
N
W
N
NgW
WNgW
N
WW
WWS )(
)(
1
1
*
*
1
**
1*
+
+
=
+
+
=
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
– When the problem size is independent of the system, 
the problem size is fixed, G(N)=1⇒ Amdahl’s Law.
– When memory is increased N times, the workload also 
increases N times, G(N)=N⇒ Gustafson’s Law
– For most of the scientific and engineering applications, 
the computation requirement increases faster than the 
memory requirement, G(N)>N.
N
N
N
W
N
NGW
WNGWS )(
)(
1
1*
+
+
=
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
0
2
4
6
8
10
Processors
S(Linear)
S(Normal)
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
 Parallelizing a code does not always result in a speedup; 
sometimes it actually slows the code down! This can be 
due to a poor choice of algorithm or to poor coding
 The best possible speedup is linear, i.e. it is proportional 
to the number of processors: T(N) = T(1)/N where 
N = number of processors, T(1) = time for serial run.
 A code that continues to speed up reasonably close to 
linearly as the number of processors increases is said to 
be scalable. Many codes scale up to some number of 
processors but adding more processors then brings no 
improvement. Very few, if any, codes are indefinitely 
scalable.
Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM
 Software overhead
Even with a completely equivalent algorithm, software overhead 
arises in the concurrent implementation. (e.g. there may be additional 
index calculations necessitated by the manner in which data are "split 
up" among processors.)
i.e. there is generally more lines of code to be executed in the parallel 
program than the sequential program. 
 Load balancing
 Communication overhead

File đính kèm:

  • pdfParallel Processing & Distributed Systems - Chapter 4 Speedup.pdf