Using GPU to do packet forwarding

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Using GPU to do packet forwarding
@ 2010-08-24 22:15 Stephen Hemminger
  2010-08-24 22:31 ` David Miller
  0 siblings, 1 reply; 2+ messages in thread
From: Stephen Hemminger @ 2010-08-24 22:15 UTC (permalink / raw)
  To: David Miller, Eric Dumazet, netdev

Interesting paper:
  http://www.ndsl.kaist.edu/~kyoungsoo/papers/packetshader.pdf
One section of general consideration is the expense of the current
skb scheme. At 10G, they measure 60% of the CPU is used doing skb
alloc/free; see paper for the alternative of using a huge
packet buffer.  Also, they managed to shrink skb to 8 bytes!

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Using GPU to do packet forwarding
  2010-08-24 22:15 Using GPU to do packet forwarding Stephen Hemminger
@ 2010-08-24 22:31 ` David Miller
  0 siblings, 0 replies; 2+ messages in thread
From: David Miller @ 2010-08-24 22:31 UTC (permalink / raw)
  To: shemminger; +Cc: dada1, netdev

From: Stephen Hemminger <shemminger@vyatta.com>
Date: Tue, 24 Aug 2010 15:15:07 -0700

> Interesting paper:
>   http://www.ndsl.kaist.edu/~kyoungsoo/papers/packetshader.pdf
> One section of general consideration is the expense of the current
> skb scheme. At 10G, they measure 60% of the CPU is used doing skb
> alloc/free; see paper for the alternative of using a huge
> packet buffer.  Also, they managed to shrink skb to 8 bytes!

It just means SLAB or SLUB are broken if the number is that high.
Also, their old kernel has none of the TX multiqueue work we've
done.

It should simply be a lockless list unlink, and how that can consume
%60 of cpu compared to the routing table lookup is beyond me.

But I suppose you can easily choose an environment and configuration
to make the numbers of one's techniques look better.

Next, they use a binary search ipv6 lookup which is going to touch
more data than a multi-way trie scheme would, it also avoids the
routing cache since ipv6 lacks one.  They even admit that they've
purposely rigged the test such that the working set doesn't fit in the
cpu cache.

It's an interesting paper, but we're going to have 64-cpu and 128-cpu
x86 machines commonly quite soon, so the arity of parallelism these
guys are getting in return (at a cost of generality) will decrease
steadily over time.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2010-08-24 22:31 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-08-24 22:15 Using GPU to do packet forwarding Stephen Hemminger
2010-08-24 22:31 ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).