noc routing algorithms - inf.pucrs.br · critério tabela implementação baseada em critério...
TRANSCRIPT
NoC Routing Algorithms
Fernando Moraes 9/abril/2014
Roteamento: Classificações
Estático
DinâmicoCritérioMomento da realização doroteamento Multicast
UnicastCritérioNúmero de destinatários
Fonte
Centralizado
Distribuído
CritérioLocal onde as decisões deroteamento são tomadas Máquina de estado
TabelaCritérioImplementação baseada em
DeterminísticoCritérioAdaptatividade
Regressivo
Progressivo
Não mínimo
Mínimo
Parcial
Completo
CritérioProgressividade
CritérioMinimalidade
CritérioNúmero de caminhos
Adaptativo
Técnicas de roteamento Livelock e Deadlock
• Livelock – Quando uma mensagem trafega permanentemente pela rede
sem chegar ao seu destino
• Deadlock – Ocorre quando existe uma dependência cíclica de recursos na
rede e as mensagens são paralisadas
Cab
2
A
3
B C
4Cbc
D
1
Ccd
Cda
X
1 3
4
2
(a) (b) (c)
Roteamento: Exemplo de deadlock Núcleo Núcleo Núcleo
Núcleo Núcleo Núcleo
Núcleo Núcleo Núcleo
Núcleo Núcleo Núcleo
Núcleo Núcleo Núcleo
Núcleo Núcleo Núcleo
Núcleo Núcleo Núcleo
Núcleo Núcleo Núcleo
Núcleo Núcleo Núcleo
Núcleo Núcleo Núcleo
Núcleo Núcleo Núcleo
Núcleo Núcleo Núcleo
Roteamento: Exemplo de deadlock Núcleo Núcleo Núcleo
Núcleo Núcleo Núcleo
Núcleo Núcleo Núcleo
Núcleo Núcleo Núcleo
Núcleo Núcleo Núcleo
Núcleo Núcleo Núcleo
Núcleo Núcleo Núcleo
Núcleo Núcleo Núcleo
Núcleo Núcleo Núcleo
Núcleo Núcleo Núcleo
Núcleo Núcleo Núcleo
Núcleo Núcleo Núcleo
Roteamento: Exemplo de deadlock Núcleo Núcleo Núcleo
Núcleo Núcleo Núcleo
Núcleo Núcleo Núcleo
Núcleo Núcleo Núcleo
Núcleo Núcleo Núcleo
Núcleo Núcleo Núcleo
Núcleo Núcleo Núcleo
Núcleo Núcleo Núcleo
Núcleo Núcleo Núcleo
Núcleo Núcleo Núcleo
Núcleo Núcleo Núcleo
Núcleo Núcleo Núcleo
Roteamento: Exemplo de deadlock Núcleo Núcleo Núcleo
Núcleo Núcleo Núcleo
Núcleo Núcleo Núcleo
Núcleo Núcleo Núcleo
Núcleo Núcleo Núcleo
Núcleo Núcleo Núcleo
Núcleo Núcleo Núcleo
Núcleo Núcleo Núcleo
Núcleo Núcleo Núcleo
Núcleo Núcleo Núcleo
Núcleo Núcleo Núcleo
Núcleo Núcleo Núcleo
Roteamento: Evitando o deadlock
• Deve-se evitar o surgimento de ciclos na rede
Ciclos na malha 2-D
Núcleo Núcleo Núcleo
Núcleo Núcleo Núcleo
Núcleo Núcleo Núcleo
Roteamento: Evitando o deadlock • Soluções • Roteamento XY
– Proíbe qualquer curva do tipo Y-para-X
– É determinístico
• Roteamento West-first – Proibe apenas as voltas em
direção ao oeste – É parcialmente adaptativo
10
The Turn Model • Turn-based model to avoid deadlock
• Possible turns = {NW, NE, SW, SE, WN, WS, EN, ES}
• Disallow >= 2 turns
• XY routing only allows turns from X to Y {EN, ES, WN, WS}
• West-first routing prohibits turns to west {NW, SW} – Offers full adaptiveness to paths that route east – Not fair to all paths
Roteamento: Algoritmo XY
(0,0)
(7,7)
1 2
3
4
• Deterministic • All messages from Src to Dest will traverse the same path • Common example: Dimension Order Routing (DOR)
– Message traverses network dimension by dimension – Aka XY routing
• Cons: – Eliminates any path diversity provided by topology – Poor load balancing
• Pros: – Simple and inexpensive to implement – Deadlock free
West-first routing algorithm • The prohibited turns are the two to the West
– if XT ≤ XS, packets are routed deterministically, as in the XY algorithm, (paths 1 and 2)
– if XT > XS packets can be routed adaptively in East, North or South directions (paths 3 and 4)
1 2
3
4
(0,0)
(7,7)
(3) e (4) – caminhos adapta5vos (1) e (2) – caminhos determinís5cos
obstáculos
13
West First não mínimo • Route a packet first west if necessary, and then adaptively south , east, and
north. • Both minimal and non-minimal paths are shown. • [Dally and Seitz] proof show that a routing algorithm is deadlock free if the
channels in the interconnection network can be numbered so that the algorithm routes every packet along channels with strictly decreasing numbers.
North-Last routing algorithm • The prohibited turns are the two when traveling
North – - if YT ≤ YS packets are routed deterministically (paths 2 and 3) – - if YT > YS packets can be routed adaptively in West, East, or
South directions (paths 1 and 4)
1 2
3
4
(0,0)
(7,7)
(1) e (4) – caminhos adapta5vos (2) e (3) – caminhos determinís5cos
obstáculos
Negative-First routing algorithm • packets are routed first in negative directions,
i.e., to the North or to the West directions – if (XT≤XS and YT≥YS) or (XT≥XS and YT≤YS) packets are
deterministically routed - paths 1 /3 – all other conditions allow some form of adaptive routing - paths
4 / 2
1 2
3
4
(0,0)
(7,7)
(2) e (4) – caminhos adapta5vos (1) e (3) – caminhos determinís5cos
obstáculos
Odd-Even Wormhole Routing
• In the previous methods, at least half of S/D pairs are restricted to having one minimal path, while full adaptiveness is provided to the others
– Unfair! • Odd-even turn routing offers solution:
– Even column: no EN or ES turn – Odd column: no NW or SW turn
Odd-Even Routing
Relative destination
Possible turns
NE SE, WN
NW SW, EN
SE NE, WS
SW NW, ES
even col
odd col
S
D
even odd even
D
S
even odd even when routing east and dest is in even col…
(don’t go into dest. col unless row matches)
(don’t go N/S in odd col)
D
S
even odd even
S
D
even odd even
when routing west…
On average, 2 routing options once for every 5 routes
(1.2 opt/route)
Valiant’s Routing Algorithm
• To route from s to d, randomly choose intermediate node d’ – Route from s to d’ and
from d’ to d. • Randomizes any traffic
pattern – All patterns appear to be
uniform random – Balances network load
• Non-minimal
d’
d
s
Minimal Oblivious
• Valiant’s: Load balancing comes at expense of significant hop count increase – Destroys locality
• Minimal Oblivious: achieve some load balancing, but use shortest paths – d’ must lie within minimum
quadrant – 6 options for d’ – Only 3 different paths
d
s
Rotamento Hamiltoniano
• Algoritmo de roteamento Hamiltoniano – Livre de deadlock – Simples de se obter uma versão determinística a partir da
versão adaptativa – Suporte a algoritmos multicast
• Caminho Hamiltoniano – Caminho acíclico no qual é possível atingir todos os nodos de
um grafo passando apenas uma vez em cada nodo – Roteadores rotulados de 0 a N-1 – Caminho segue a ordem crescente ou decrescente dos
rótulos
Serviço de comunicação com roteamento diferenciado
• Versão não-mínima parcialmente adaptativa
0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
Serviço de comunicação com roteamento diferenciado
• Versão não-mínima parcialmente adaptativa
0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
Serviço de comunicação com roteamento diferenciado
• Versão mínima determinística
0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
Serviço de comunicação com roteamento diferenciado
• Versão mínima determinística
0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
Collective communication service
• Dual-path multicast algorithm – Originally proposed for Multicomputers (1994) – 2D mesh topology – Hamiltonian routing algorithm
• Multicast messages – Parallel search algorithms – Blocked matrix multiplication – Cache coherence protocol – Control messages
Collective communication service
• Dual-path multicast algorithm – Targets set divided in 2 sub-sets – A message copy sent to each sub-set
0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
7 8 11 12
3 1 0
> 5
Ascending
< 5
Descending
27
Routing Adaptivity in Function of Traffic ■ Congestion Signaling
– Single bit (congested/not congested) – Bus (free slots, congestion levels)
Leste
Oeste
Norte
Sul
cong
cong
cong
cong
Fully Adaptive Routing with VCs • Can achieve fully adaptive routing with VCs
– Problem: minimize required number of VCs – Use of symetrical routing algorithms
c
VIRTUAL CHANNEL
c
c
c
PHYSICAL CHANNEL
CROSSBAR 9x9EAST
OUTPUT CHANNELS
INPUT CHANNELS
WEST
NORTH
SOUTH
LOCAL
EAST WEST NORTH SOUTH LOCAL
29
Replicated Channels x Virtual Channels
Switch Control
Crossbar 10x10North
North
South
East
West
Local
Input buffers
Router
South
East
West
Local
Switch Control
Crossbar 10x10
North2
North1
South2
South1
East2
East1
West2
West1
Local2
Local1
North2
North1
South2
South1
East2
East1
West2
West1
Local2
Local1
Input buffers
Router
X
2 physical channels 2 virtual channels
• Smaller area: same crossbar, mux/demux suppressed
30
Replicated Channels x Virtual Channels
• Latency/throughput evaluation – Mesh 4x4 – XY – Packet switching – Virtual channel router: 16-flit deep buffer/virtual channel – Replicated channel router: 16-flit deep buffer/physical
channel
00 10 20 30
01 11 21 31
02 12 22 32
03 13 23 33
• Each source transmits 500 packets/257-flits size
• Packets are injected in the NoC at the channel rate
31
Replicated Channels x Virtual Channels
• Average latency/throughput packet
• Replicated channels when compared to virtual channels – reduce 47.3% the average latency – doubles the average throughput
• Expected result, since the internal NoC bandwidth doubles
32
Replicated Channels x Virtual Channels
• FPGA area consumption
• 12% area reduction for a single 5 ports router • 15% area reduction for a 4x4 mesh NoC
33
Replicated Channels x Virtual Channels
• ASIC area consumption
■ Replicated channels – Alternative to virtual channels – Smaller logic area and increased performance – Power dissipation due to the switching activity in wires:
future work
■ 4% area reduction for a single 5 ports router ■ 6.4% area reduction for a 4x4 mesh NoC