netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: starlight@binnacle.cx
To: chetan loke <loke.chetan@gmail.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>,
	linux-kernel@vger.kernel.org, netdev <netdev@vger.kernel.org>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Christoph Lameter <cl@gentwo.org>, Willy Tarreau <w@1wt.eu>,
	Ingo Molnar <mingo@elte.hu>,
	Stephen Hemminger <stephen.hemminger@vyatta.com>,
	Benjamin LaHaise <bcrl@kvack.org>, Joe Perches <joe@perches.com>,
	lokechetan@gmail.com, Con Kolivas <conman@kolivas.org>,
	Serge Belyshev <belyshev@depni.sinp.msu.ru>
Subject: Re: big picture UDP/IP performance question re 2.6.18 -> 2.6.32
Date: Fri, 07 Oct 2011 14:37:47 -0400	[thread overview]
Message-ID: <6.2.5.6.2.20111007143050.039bd578@binnacle.cx> (raw)
In-Reply-To: <CAAsGZS4s1wTWW1j7FRUWW9jqpPUVF3Q46AMa7+njvE1ckX0Snw @mail.gmail.com>

At 02:09 PM 10/7/2011 -0400, chetan loke wrote:
>I'm a little confused. Seems like there are
>conflicting goals. If you want to bypass the
>kernel-protocol-stack then you have the following
>options: a) kernel af_packet. This is where we
>would get a chance to test all the kernel features
>etc.

Perhaps I haven't been sufficiently clear.
The "packet socket" mode I refer to in the
earlier post was using AF/PF_PACKET mode sockets
as in

   socket(PF_PACKET, SOCK_RAW, eth_p_all);

Have run it in both normal and memory mapped
modes.  MMAP mode is a slight bit more expensive
due to the cache pressure from the additional
copy.  On the 6174 MMAP seems to be a smidgen
better in certain tests, but in the end both
read() and mapped approaches are effectively
identical on performance--and generally match
the cost of UDP sockets almost exactly.

b) Use non-commodity(?) NICs(from vendors
>you mentioned): where it might have some on-board
>memory(cushion) and so it can absorb the spikes
>and can also smoothen out too many
>PCI-transactions for bursty (and small payload -
>as in 64 byte traffic). But wait, when you use the
>libs provided by these vendors, then their
>driver(especially the Rx path) is not so much
>working in inline mode as NIC drivers in case a)
>above. This driver with a special Rx-path purely
>exists for managing your mmap'd queues.So
>of-course it's going to be faster that the
>traditional inline drivers. In this partial-inline
>mode, the adapter might i) batch the packets and
>ii) send a single notification to the
>host-side. With that single event you are now
>processing 1+ packets.

Kernel bypass is probably the best answer for
what we do.  Problem has been lack of maturity
in their driver software.  Looks like it's reaching
a point where they cover our use case.  As mentioned
earlier, Solarflare could not match the Intel
82599 + ixgbe for this app last year.  Was a
disaster.  Myricom is focused on UDP (better
for us), but only just added multi-core IRQ
doorbell wakeups in recent months.  Previously
one had to accept all IRQs on a single core or
poll, neither of which works for us.

>You got it. In case of tilera there are two modes:
>tile-cpu in device mode: beats most of the
>non-COTS NICs. It runs linux on the adapter
>side. Imagine having the flexibility/power to
>program the ASIC using your favorite OS. Its
>orgasmic. So go for it!  tile-cpu in host-mode:
>Yes, it could be a game changer.

We almost went for the 1st gen Tile64 outboard
NIC approach, but were concerned about whether
they would survive--still are.  Intel has
crushed more than a few competitors along
the way.  If Google or Facebook buys into the
Tile-Gx it becomes a safe choice overnight.

  parent reply	other threads:[~2011-10-07 18:47 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-10-07  3:27 big picture UDP/IP performance question re 2.6.18 -> 2.6.32 starlight
2011-10-07  5:40 ` Eric Dumazet
2011-10-07  6:13   ` starlight
2011-10-07 18:09     ` chetan loke
     [not found]       ` <CAAsGZS4s1wTWW1j7FRUWW9jqpPUVF3Q46AMa7+njvE1ckX0Snw @mail.gmail.com>
2011-10-07 18:37         ` starlight [this message]
2011-10-07 19:27           ` chetan loke
     [not found]             ` <CAAsGZS4b2F9N3nV3TNu5xG+=2d0L0ncste4xv2vqoVFb1pOxEw @mail.gmail.com>
2011-10-07 19:41               ` starlight
2011-10-07 20:07           ` Ben Hutchings
2011-10-11 16:24   ` Chris Friesen
  -- strict thread matches above, loose matches on Subject: below --
2011-10-07  2:33 starlight
2011-10-07  2:24 starlight
2011-10-05  6:58 starlight
2011-10-05  8:53 ` Eric Dumazet
     [not found]   ` <1317804832.2473.25.camel@edumazet-HP-Compaq-6005-Pr o-SFF-PC>
2011-10-05 11:50     ` starlight
2011-10-05  6:11 starlight
2011-10-05  3:35 starlight
2011-10-03 18:02 starlight
2011-10-05  6:53 ` Eric Dumazet
2011-10-03 15:25 starlight
2011-10-03 16:16 ` Eric Dumazet
     [not found]   ` <1317658588.2442.5.camel@edumazet-HP-Compaq-6005-Pro -SFF-PC>
2011-10-03 16:28     ` starlight
2011-10-04 19:16 ` Christoph Lameter
2011-10-04 19:38   ` Joe Perches
2011-10-04 19:42     ` Christoph Lameter
2011-10-04 19:49       ` Serge Belyshev
2011-10-04 20:03         ` Christoph Lameter
2011-10-04 20:12           ` Serge Belyshev
2011-10-04 22:32             ` Con Kolivas
2011-10-04 19:45     ` starlight
2011-10-05 13:22   ` Peter Zijlstra
2011-10-05 14:26     ` Christoph Lameter
2011-10-05 15:12       ` Andi Kleen
2011-10-05 15:33       ` Peter Zijlstra
2011-10-05 15:12     ` starlight
2011-10-02  5:33 starlight
2011-10-02  7:21 ` Eric Dumazet
2011-10-02  8:03   ` Eric Dumazet
2011-10-02 14:47   ` Stephen Hemminger
2011-10-02 15:06   ` starlight
2011-10-04 19:54     ` Loke, Chetan
2011-10-01 21:13 starlight
2011-10-01 18:16 starlight
2011-10-01 18:40 ` Willy Tarreau
2011-10-01 19:11 ` Eric Dumazet
2011-10-01 19:43   ` starlight
     [not found] <6.2.5.6.2.20111001012019.05c05b80@flumedata.com>
2011-10-01  6:44 ` Eric Dumazet
2011-10-01 15:56   ` starlight

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6.2.5.6.2.20111007143050.039bd578@binnacle.cx \
    --to=starlight@binnacle.cx \
    --cc=a.p.zijlstra@chello.nl \
    --cc=bcrl@kvack.org \
    --cc=belyshev@depni.sinp.msu.ru \
    --cc=cl@gentwo.org \
    --cc=conman@kolivas.org \
    --cc=eric.dumazet@gmail.com \
    --cc=joe@perches.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=loke.chetan@gmail.com \
    --cc=lokechetan@gmail.com \
    --cc=mingo@elte.hu \
    --cc=netdev@vger.kernel.org \
    --cc=stephen.hemminger@vyatta.com \
    --cc=w@1wt.eu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).