gpgpu sim tutorial
TRANSCRIPT
-
8/18/2019 GPGPU Sim Tutorial
1/28
GPGPU-Sim TutorialZhen Lin
North Carolina State UniversityBased on GPGPU-Sim Tutorial and Manual by UBC
-
8/18/2019 GPGPU Sim Tutorial
2/28
Outline
•
GPGPU-Sim Overview• Demo1: Setup & Configuration
• GPGPU-Sim Internals
• Demo2: Scheduling Study
-
8/18/2019 GPGPU Sim Tutorial
3/28
Outline
•
GPGPU-Sim Overview• Demo1: Setup & Configuration
• GPGPU-Sim Internals
• Demo2: Scheduling Study
-
8/18/2019 GPGPU Sim Tutorial
4/28
GPGPU-Sim in a Nutshell
•
Microarchitecture timing model of contemporary GPUs• Run unmodified CUDA/OpenCL
-
8/18/2019 GPGPU Sim Tutorial
5/28
What GPGPU-Sim Simulates
•
Functional model• PTX
• SASS
• Timing model for the compute part of a GPU
• Not for CPU or PCIe
•
Only model microarchitecture timing relevant to compute
-
8/18/2019 GPGPU Sim Tutorial
6/28
Functional model
•
PTX• A low-level, data-parallel virtual machine and instruction set archi
• Between CUDA and hardware ISA (SASS)
• Stable ISA that spans multiple GPU generations
• SASS/PTXPLUS• Hardware native ISA
• PTX -> Translate + Optimize -> SASS• More accurate, but not well supported
• CUDA tool chain
-
8/18/2019 GPGPU Sim Tutorial
7/28
Functional Model (PTX)
• Scalar ISA
• SSA representation: register allocation not done in PTX
-
8/18/2019 GPGPU Sim Tutorial
8/28
Timing Model for GPU Micro-Architectu
•
GPGPU-Sim simulates the timing modelof a GPU running each launched CUDAkernel
• Report stats (e.g. # cycles) for each kernel
• Exclude any time spent on data transferon PCIe bus
• CPU is assumed to be idle when the GPUis working
-
8/18/2019 GPGPU Sim Tutorial
9/28
-
8/18/2019 GPGPU Sim Tutorial
10/28
Outline
•
GPGPU-Sim Overview• Demo1: Setup & Configuration
• GPGPU-Sim Internals
• Demo2: Scheduling Study
-
8/18/2019 GPGPU Sim Tutorial
11/28
Demo1
•
Setup• Stats
• Configuration
-
8/18/2019 GPGPU Sim Tutorial
12/28
Outline
•
GPGPU-Sim Overview• Demo1: Setup & Configuration
• GPGPU-Sim Internals
• Demo2: Scheduling Study
-
8/18/2019 GPGPU Sim Tutorial
13/28
Overview of the Architecture
-
8/18/2019 GPGPU Sim Tutorial
14/28
Inside a SIMT Core
•
Pipeline stages• Fetch
• Decode
• Issue
• Read operand
• Execution
• Writeback
-
8/18/2019 GPGPU Sim Tutorial
15/28
Fetch + Decode
•
Arbitrate the I-cacheamong warps
• Cache miss handled byfetching again later
• Fetched instruction isdecoded and then
stored in the I-Buffer• 1 or more entries / warp
• Only warp with vacantentries are considered infetch
-
8/18/2019 GPGPU Sim Tutorial
16/28
Issue
•
Selects a warp with a readyinstruction
• Acquires the activemaskfrom TOS of SIMT stack
• Invalid the I-buffer
-
8/18/2019 GPGPU Sim Tutorial
17/28
December 2012 GPGPU-Sim Tutorial (MICRO 2012) 4: Microarchitecture Model
Scoreboard
• Checks for RAW and WAW
dependency hazard• Flag instructions with hazards as not ready in I-Buffer
(masking them out from the scheduler)
• Instructions reserves dest registers at issue
• Release them at writeback
-
8/18/2019 GPGPU Sim Tutorial
18/28
December 2012 GPGPU-Sim Tutorial (MICRO 2012) 4: Microarchitecture Model
Read Operand
• Operand Collector Architecture (US Patent: 7834881)
– Interleave operand fetch from different threads to achieve full utilization
Bank 0 Bank 1 Bank 2 Bank 3
R0 R1 R2 R3
R4 R5 R6 R7
R8 R9 R10 R11
… … … …
add.s32 R3, R1, R2; No Conflict
mul.s32 R3, R0, R4; Conflict at bank 0
-
8/18/2019 GPGPU Sim Tutorial
19/28
December 2012 GPGPU-Sim Tutorial (MICRO 2012) 4: Microarchitecture Model
Operand Collector
(from instruction issue stage)
dispatch
-
8/18/2019 GPGPU Sim Tutorial
20/28
Execution
• ALU
• Stream processor (SP)
• Specific function unit (SFU)
• MEM
• Shared memory
• Local memory
• Global memory
• Texture memory
• Constant memory
-
8/18/2019 GPGPU Sim Tutorial
21/28
December 2012 GPGPU-Sim Tutorial (MICRO 2012) 4: Microarchitecture Model
ALU Pipelines
• SIMD Execution Unit
• Fully Pipelined
• Each pipe may execute a subset of instructions
• Configurable bandwidth and latency (depending on the inst
• Default: SP + SFU pipes
-
8/18/2019 GPGPU Sim Tutorial
22/28
December 2012 GPGPU-Sim Tutorial (MICRO 2012) 4: Microarchitecture Model
Memory Unit• Model timing for memory
instructions
• Support half-warp (16threads)
• Double clock the unit
• Each cycle service half thewarp
• Has a private writebackpath
Access
Coalesc. A
G
U
Shared
Mem
Bank
Conflict
Const.
Cache
Texture
Cache
Data
Cache
M e m
o r y P o r t
MSH
-
8/18/2019 GPGPU Sim Tutorial
23/28
Writeback
• Write result to register file
• Scoreboard updates the r-bit
-
8/18/2019 GPGPU Sim Tutorial
24/28
Stack-Based Branch Divergence Hardwa
• When the branch diverge
• New entries are pushed to SIMT stack
• RPC set to the immediate post dominator
• Activemast indicates which threads are active
• PC is sent to fetch unit
• When RPC is reached
• Pop the TOS
• PC of new TOS is sent to the fetch unit
-
8/18/2019 GPGPU Sim Tutorial
25/28
Outline
• GPGPU-Sim Overview
• Demo1: Setup & Configuration
• GPGPU-Sim Internals
• Demo2: Scheduling Study
-
8/18/2019 GPGPU Sim Tutorial
26/28
Demo2
• Software framework overview
• To monitor the warp scheduling order
• Compare with different scheduling policies
-
8/18/2019 GPGPU Sim Tutorial
27/28
For More Information
• http://www.gpgpu-sim.org/
http://www.gpgpu-sim.org/http://www.gpgpu-sim.org/http://www.gpgpu-sim.org/
-
8/18/2019 GPGPU Sim Tutorial
28/28
• Thanks & question?