Parallel Processing & Distributed Systems - Chapter 3: Parallel Computer Architectures
Based on notions of instruction and data streams
–SISD(Single Instruction stream, a Single Data stream )
–SIMD(Single Instruction stream, Multiple Data streams )
–MISD(Multiple Instruction streams, a Single Data stream)
–MIMD(Multiple Instruction streams, Multiple Data stream)
Popularity
–MIMD> SIMD> MISD
ons of instruction and data streams – SISD (Single Instruction stream, a Single Data stream ) – SIMD (Single Instruction stream, Multiple Data streams ) – MISD (Multiple Instruction streams, a Single Data stream) – MIMD (Multiple Instruction streams, Multiple Data stream) Popularity – MIMD > SIMD > MISD Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM SISD – Conventional sequential machines IS : Instruction Stream DS : Data Stream CU : Control Unit PU : Processing Unit MU : Memory Unit CU PU MU IS IS DS I/O Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM SIMD – Vector computers, processor arrays – Special purpose computations PE : Processing Element LM : Local Memory CU PE1 LM1 PEn LMn DS DS DS DS IS IS Program loaded from host Data sets loaded from host SIMD architecture with distributed memory Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM MISD – Systolic arrays – Special purpose computations Memory (Program, Data) PU1 PU2 PUn CU1 CU2 CUn DS DS DS IS IS IS IS IS DS I/O MISD architecture (the systolic array) Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM MIMD – General purpose parallel computers CU1 PU1 Shared Memory IS IS DS I/O CUn PUn IS DSI/O IS MIMD architecture with shared memory Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Classification based on Architecture Pipelined Computers Dataflow Architectures Data Parallel Systems Multiprocessors Multicomputers Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Instructions are divided into a number of steps (segments, stages) At the same time, several instructions can be loaded in the machine and be executed in different steps Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM – IF – instruction fetch – ID – instruction decode and register fetch – EX- execution and effective address calculation – MEM – memory access – WB- write back Instruction i IF ID EX WB IF ID EX WB IF ID EX WB IF ID EX WB IF ID EX WB Instruction i+1 Instruction i+2 Instruction i+3 Instruction # 1 2 3 4 5 6 7 8 Cycles MEM MEM MEM MEM MEM 9 Instruction i+4 Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Data-driven model – A program is represented as a directed acyclic graph in which a node represents an instruction and an edge represents the data dependency relationship between the connected nodes – Firing rule » A node can be scheduled for execution if and only if its input data become valid for consumption Dataflow languages – Id, SISAL, Silage, LISP,... – Single assignment, applicative(functional) language – Explicit parallelism Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM z = (a + b) * c + * a b c z The dataflow representation of an arithmetic expression Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Execution of instructions is driven by data availability – What is the difference between this and normal (control flow) computers? Advantages – Very high potential for parallelism – High throughput – Free from side-effect Disadvantages – Time lost waiting for unneeded arguments – High control overhead – Difficult in manipulating data structures Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM input d,e,f c0 = 0 for i from 1 to 4 do begin ai := di / ei bi := ai * fi ci := bi + ci-1 end output a, b, c / d1 e1 * + f1 / d2 e2 * + f2 / d3 e3 * + f3 / d4 e4 * + f4 a1 b1 a2 b2 a3 b3 a4 b4 c0 c4 c1 c2 c3 Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Execution on a Control Flow Machine a1 b1 c1 a2 b2 c2 a4 b4 c4 Sequential execution on a uniprocessor in 24 cycles Assume all the external inputs are available before entering do loop + : 1 cycle, * : 2 cycles, / : 3 cycles, How long will it take to execute this program on a dataflow computer with 4 processors? Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Execution on a Dataflow Machine c1 c2 c3 c4a1 a2 a3 a4 b1 b2 b4 b3 Data-driven execution on a 4-processor dataflow computer in 9 cycles Can we further reduce the execution time of this program ? Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Programming model – Operations performed in parallel on each element of data structure – Logically single thread of control, performs sequential or parallel steps – Conceptually, a processor associated with each data element Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM SIMD Architectural model – Array of many simple, cheap processors with little memory each » Processors don’t sequence through instructions – Attached to a control processor that issues instructions – Specialized and general communication, cheap global synchronization Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Instruction set includes operations on vectors as well as scalars 2 types of vector computers – Processor arrays – Pipelined vector processors Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM A sequential computer connected with a set of identical processing elements simultaneouls doing the same operation on different data. Eg CM-200 Data path Processing element Data memory I n t e r c o n n e c t i o n n e t w o r k Processing element Data memory Processing element Data memory I/0 Processor array Instruction path Program and Data Memory CPU I/O processor Front-end computer I/0 Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Stream vector from memory to the CPU Use pipelined arithmetic units to manipulate data Eg: Cray-1, Cyber-205 Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Consists of many fully programmable processors each capable of executing its own program Shared address space architecture Classified into 2 types – Uniform Memory Access (UMA) Multiprocessors – Non-Uniform Memory Access (NUMA) Multiprocessors Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Uses a central switching mechanism to reach a centralized shared memory All processors have equal access time to global memory Tightly coupled system Problem: cache consistency P1 Switching mechanism I/O P2 … Pn Memory banks C1 C2 Cn Pi Ci Processor i Cache i Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Crossbar switching mechanism Mem Mem Mem Mem Cache P I/OCache P I/O Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Shared-bus switching mechanism Mem Mem Mem Mem Cache P I/OCache P I/O Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Packet-switched network Network Mem Mem Cache P Cache P Cache P Mem Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Distributed shared memory combined by local memory of all processors Memory access time depends on whether it is local to the processor Caching shared (particularly nonlocal) data? Network Cache P Mem Cache P Mem Cache P Mem Cache P Mem Distributed Memory Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Current Types of Multiprocessors PVP (Parallel Vector Processor) – A small number of proprietary vector processors connected by a high-bandwidth crossbar switch SMP (Symmetric Multiprocessor) – A small number of COST microprocessors connected by a high-speed bus or crossbar switch DSM (Distributed Shared Memory) – Similar to SMP – The memory is physically distributed among nodes. Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM VP VP Crossbar Switch VP SM SM SM VP : Vector Processor SM : Shared Memory Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM SMP (Symmetric Multi-Processor) P/C P/C Bus or Crossbar Switch P/C SM SM SM P/C : Microprocessor and Cache SM: Shared Memory Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM DSM (Distributed Shared Memory) Custom-Designed Network MB MB P/C LM NIC DIR P/C LM NIC DIR MB: Memory Bus P/C: Microprocessor & Cache LM: Local Memory DIR: Cache Directory NIC: Network Interface Circuitry Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Consists of many processors with their own memory No shared memory Processors interact via message passing loosely coupled system P/C P/C Message-passing Interconnection Network P/C M M M P/C: Microprocessor & Cache M: Memory Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Current Types of Multicomputers MPP (Massively Parallel Processing) – Total number of processors > 1000 Cluster – Each node in system has less than 16 processors. Constellation – Each node in system has more than 16 processors Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM MPP (Massively Parallel Processing) P/C LM NIC MB P/C LM NIC MB Custom-Designed Network P/C: Microprocessor & Cache MB: Memory Bus NIC: Network Interface Circuitry LM: Local Memory Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Commodity Network (Ethernet, ATM, Myrinet, VIA) MB MB P/C M NIC P/C M Bridge Bridge LD LD NIC IOB IOB MB: Memory Bus P/C: Microprocessor & Cache M: Memory LD: Local Disk IOB: I/O Bus NIC: Network Interface Circuitry Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM P/CP/C SM SM NICLD Hub Custom or Commodity Network >= 16 IOC P/CP/C SM SM NICLD Hub >= 16 IOC P/C: Microprocessor & Cache MB: Memory Bus NIC: Network Interface Circuitry SM: Shared Memory IOC: I/O Controller LD: Local Disk Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM Trend in Parallel Computer Architectures 0 50 100 150 200 250 300 350 400 1997 1998 1999 2000 2001 2002 Years MPPs Constellations Clusters SMPs
File đính kèm:
- Parallel Processing & Distributed Systems - Chapter 3 Parallel Computer Architectures.pdf