Parallel Processing & Distributed Systems - Chapter 4: Speedup
The main objective is to produce the results as
soon as possible
– (ex) video compression, computer graphics, VLSI
routing, etc
Implications
– Upper-bound is
– Make Sequential bottleneck as small as possible
– Optimize the common case
Modified Amdahl’s law for fixed problem size
including the overhead
ThoaiNam Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Speedup & Efficiency Amdahl’s Law Gustafson’s Law Sun & Ni’s Law Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Speedup: S = Time(the most efficient sequential algorithm) / Time(parallel algorithm) Efficiency: E = S / N with N is the number of processors Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Amdahl’s Law – Fixed Problem Size (1) The main objective is to produce the results as soon as possible – (ex) video compression, computer graphics, VLSI routing, etc Implications – Upper-bound is – Make Sequential bottleneck as small as possible – Optimize the common case Modified Amdahl’s law for fixed problem size including the overhead Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Amdahl’s Law – Fixed Problem Size (2) ParallelSequentialSequential P5 P6 P7 P8P4P0 P1 P2 P3 P9SequentialParallel T(1) T(N) Ts Tp Ts=αT(1)⇒ Tp= (1-α)T(1) T(N) = αT(1)+ (1-α)T(1)/N Number of processors Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Amdahl’s Law – Fixed Problem Size (3) ∞→→ − + = − + = Nas NN TT TSpeedup αα α α α 1 )1( 1 )1()1()1( )1( )( )1( NTime TimeSpeedup = Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM ∞→ + → + − + = Nas T TT N TT TSpeedup overhead overhead )1( 1 )1()1()1( )1( α α α The overhead includes parallelism and interaction overheads Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Gustafson’s Law – Fixed Time (1) User wants more accurate results within a time limit – Execution time is fixed as system scales – (ex) FEM for structural analysis, FDM for fluid dynamics Properties of a work metric – Easy to measure – Architecture independent – Easy to model with an analytical expression – No additional experiment to measure the work – The measure of work should scale linearly with sequential time complexity of the algorithm Time constrained seems to be most generally viable model! Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Gustafson’s Law – Fixed Time (2) Parallel P5 P6 P7 P8P4P0 P1 P2 P3 P9Sequential Sequential P0Sequential P9 . . . W0Ws α = Ws / W(N) W(N) = αW(N) + (1-α)W(N) ⇒W(1) = αW(N) + (1-α)W(N)N W(N) Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Gustafson’s Law – Fixed Time without overhead N W NWW kNW kW NT TSpeedup )1(1( ).( ).1( )( )1( αα αα −+= )−+ === Time = Work . k W(N) = W Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Gustafson’s Law – Fixed Time with overhead W W N WW NWW kNW kW NT TSpeedup 00 1 1(1( ).( ).1( )( )1( + )−+ = + )−+ === αααα W(N) = W + W0 Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Sun and Ni’s Law – Fixed Memory (1) Scale the largest possible solution limited by the memory space. Or, fix memory usage per processor Speedup, – Time(1)/Time(N) for scaled up problem is not appropriate. – For simple profile, and G(N) is the increase of parallel workload as the memory capacity increases n times. Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Sun and Ni’s Law – Fixed Memory (2) N NG NGSpeedupMC )()1( )()1( αα αα −+ −+ = W=αW+(1- α)W Let M be the memory capacity of a single node N nodes: – the increased memory nM – The scaled work: W=αW+(1- α)G(N)W Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Definition: A function is homomorphism if there exists a function such that for any real number c and variable x, . Theorem: If W = for some homomorphism function , , then, with all data being shared by all available processors, the simplified memory- bounced speedup is Sun and Ni’s Law – Fixed Memory (3) N NG NG W N NgW WNgWS N N N )()1( )()1( )( )( 1 1* αα αα −+ −+ = + + = g )()()( xgcgcxg = g )(Mg g )()()( xgcgcxg = Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Proof: Let the memory requirement of Wn be M, Wn = . M is the memory requirement when 1 node is available. With N nodes available, the memory capacity will increase to NM. Using all of the available memory, for the scaled parallel portion : . Sun and Ni’s Law – Fixed Memory (4) NN WNgMgNgNMgW )()()()( * === )(Mg * NW N N N N N W N NgW WNgW N WW WWS )( )( 1 1 * * 1 ** 1* + + = + + = Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM – When the problem size is independent of the system, the problem size is fixed, G(N)=1⇒ Amdahl’s Law. – When memory is increased N times, the workload also increases N times, G(N)=N⇒ Gustafson’s Law – For most of the scientific and engineering applications, the computation requirement increases faster than the memory requirement, G(N)>N. N N N W N NGW WNGWS )( )( 1 1* + + = Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM 0 2 4 6 8 10 Processors S(Linear) S(Normal) Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Parallelizing a code does not always result in a speedup; sometimes it actually slows the code down! This can be due to a poor choice of algorithm or to poor coding The best possible speedup is linear, i.e. it is proportional to the number of processors: T(N) = T(1)/N where N = number of processors, T(1) = time for serial run. A code that continues to speed up reasonably close to linearly as the number of processors increases is said to be scalable. Many codes scale up to some number of processors but adding more processors then brings no improvement. Very few, if any, codes are indefinitely scalable. Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Software overhead Even with a completely equivalent algorithm, software overhead arises in the concurrent implementation. (e.g. there may be additional index calculations necessitated by the manner in which data are "split up" among processors.) i.e. there is generally more lines of code to be executed in the parallel program than the sequential program. Load balancing Communication overhead
File đính kèm:
- Parallel Processing & Distributed Systems - Chapter 4 Speedup.pdf