gpgpu sim tutorial

8/18/2019 GPGPU Sim Tutorial

1/28

GPGPU-Sim TutorialZhen Lin

North Carolina State UniversityBased on GPGPU-Sim Tutorial and Manual by UBC


2/28

Outline

•

GPGPU-Sim Overview• Demo1: Setup & Configuration

• GPGPU-Sim Internals

• Demo2: Scheduling Study


3/28

Outline

•





4/28

GPGPU-Sim in a Nutshell

•

Microarchitecture timing model of contemporary GPUs• Run unmodified CUDA/OpenCL


5/28

What GPGPU-Sim Simulates

•

Functional model• PTX

• SASS

• Timing model for the compute part of a GPU

• Not for CPU or PCIe

•

Only model microarchitecture timing relevant to compute


6/28

Functional model

•

PTX• A low-level, data-parallel virtual machine and instruction set archi

• Between CUDA and hardware ISA (SASS)

• Stable ISA that spans multiple GPU generations

• SASS/PTXPLUS• Hardware native ISA

• PTX -> Translate + Optimize -> SASS• More accurate, but not well supported

• CUDA tool chain


7/28

Functional Model (PTX)

• Scalar ISA

• SSA representation: register allocation not done in PTX


8/28

Timing Model for GPU Micro-Architectu

•

GPGPU-Sim simulates the timing modelof a GPU running each launched CUDAkernel

• Report stats (e.g. # cycles) for each kernel

• Exclude any time spent on data transferon PCIe bus

• CPU is assumed to be idle when the GPUis working


9/28


10/28

Outline

•





11/28

Demo1

•

Setup• Stats

• Configuration


12/28

Outline

•





13/28

Overview of the Architecture


14/28

Inside a SIMT Core

•

Pipeline stages• Fetch

• Decode

• Issue

• Read operand

• Execution

• Writeback


15/28

Fetch + Decode

•

Arbitrate the I-cacheamong warps

• Cache miss handled byfetching again later

• Fetched instruction isdecoded and then

stored in the I-Buffer• 1 or more entries / warp

• Only warp with vacantentries are considered infetch


16/28

Issue

•

Selects a warp with a readyinstruction

• Acquires the activemaskfrom TOS of SIMT stack

• Invalid the I-buffer


17/28

December 2012 GPGPU-Sim Tutorial (MICRO 2012) 4: Microarchitecture Model

Scoreboard

• Checks for RAW and WAW

dependency hazard• Flag instructions with hazards as not ready in I-Buffer

(masking them out from the scheduler)

• Instructions reserves dest registers at issue

• Release them at writeback


18/28


Read Operand

• Operand Collector Architecture (US Patent: 7834881)

– Interleave operand fetch from different threads to achieve full utilization

Bank 0 Bank 1 Bank 2 Bank 3

R0 R1 R2 R3

R4 R5 R6 R7

R8 R9 R10 R11

… … … …

add.s32 R3, R1, R2; No Conflict

mul.s32 R3, R0, R4; Conflict at bank 0


19/28


Operand Collector

(from instruction issue stage)

dispatch


20/28

Execution

• ALU

• Stream processor (SP)

• Specific function unit (SFU)

• MEM

• Shared memory

• Local memory

• Global memory

• Texture memory

• Constant memory


21/28


ALU Pipelines

• SIMD Execution Unit

• Fully Pipelined

• Each pipe may execute a subset of instructions

• Configurable bandwidth and latency (depending on the inst

• Default: SP + SFU pipes


22/28


Memory Unit• Model timing for memory

instructions

• Support half-warp (16threads)

• Double clock the unit

• Each cycle service half thewarp

• Has a private writebackpath

Access

Coalesc. A

G

U

Shared

Mem

Bank

Conflict

Const.

Cache

Texture

Cache

Data

Cache

M e m

o r y P o r t

MSH


23/28

Writeback

• Write result to register file

• Scoreboard updates the r-bit


24/28

Stack-Based Branch Divergence Hardwa

• When the branch diverge

• New entries are pushed to SIMT stack

• RPC set to the immediate post dominator

• Activemast indicates which threads are active

• PC is sent to fetch unit

• When RPC is reached

• Pop the TOS

• PC of new TOS is sent to the fetch unit


25/28

Outline

• GPGPU-Sim Overview

• Demo1: Setup & Configuration




26/28

Demo2

• Software framework overview

• To monitor the warp scheduling order

• Compare with different scheduling policies


27/28

For More Information

• http://www.gpgpu-sim.org/

http://www.gpgpu-sim.org/http://www.gpgpu-sim.org/http://www.gpgpu-sim.org/


28/28

• Thanks & question?

gpgpu sim tutorial

Documents