From: starlight@binnacle.cx
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: linux-kernel@vger.kernel.org, netdev <netdev@vger.kernel.org>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Christoph Lameter <cl@gentwo.org>, Willy Tarreau <w@1wt.eu>,
	Ingo Molnar <mingo@elte.hu>,
	Stephen Hemminger <stephen.hemminger@vyatta.com>,
	Benjamin LaHaise <bcrl@kvack.org>, Joe Perches <joe@perches.com>,
	Chetan Loke <Chetan.Loke@netscout.com>,
	Con Kolivas <conman@kolivas.org>,
	Serge Belyshev <belyshev@depni.sinp.msu.ru>
Subject: Re: big picture UDP/IP performance question re 2.6.18 -> 2.6.32
Date: Fri, 07 Oct 2011 02:13:42 -0400	[thread overview]
Message-ID: <6.2.5.6.2.20111007020308.039be248@binnacle.cx> (raw)
In-Reply-To: <1317966007.3457.47.camel@edumazet-laptop>
At 07:40 AM 10/7/2011 +0200, Eric Dumazet wrote:
>
>Thats exactly the opposite : Your old kernel is not fast enough 
>to enter/exit NAPI on every incoming frame.
>
>Instead of one IRQ per incoming frame, you have less interrupts:
>A napi run processes more than 1 frame.
Please look at the data I posted.  Batching
appears to give 80us *better* latency in this
case--with the old kernel.
>Now increase your incoming rate, and you'll discover a new 
>kernel will be able to process more frames without losses.
Data loss happens mainly in relation to CPU
utilization-per-message-rate as buffers are
configured huge at all points.  So newer
kernels break down at significantly lower
message rates than older kernels.  Determined
this last year when testing SLES 11 and
unmodified 2.6.27.
I can run a max-rate comparison for 2.6.39.4
if you like.
>About your thread model :
>
>You have one thread that reads the incoming
>frame, and do a distribution on several queues
>based on some flow parameters. Then you wakeup 
>a second thread.
>
>This kind of model is very expensive and triggers
>lot of false sharing.
Please note my use of the word "nominal" and the
overall context.  Both thread-per-socket and
dual-thread handoff handling tests were performed
with the clear observation that the former is
the production model and works best at maximum
load.
However at 50% load (the test here), the
dual-thread handoff model is the clear winner
over *all* other scenarios.
>New kernels are able to perform this fanout in kernel land.
Yes, of course I am interested in Intel's flow
director and similar solutions, netfilter especially.
2.6.32 is only recently available in commercial
deployment and I will be looking at that next up.
Mainly I'll be looking at complete kernel bypass
with 10G.  Myricom looks like it might be good.
Tested Solarflare last year and it was a bust
for high volume UDP (one thread) but I've heard
that they fixed that and will revisit.
In relation to the above observation that
less CPU-per-packet is best for avoiding
data loss, it also correlates somewhat
(though not always) with better latency.
I've used the 'e1000e' 1G network interfaces
for these tests because they work better than
the multi-queue 'igb' (Intel 82576) and
'ixgbe' (Intel 82599) in all scenarios other
than maximum-stress load.  The reason is
apparently that the old 'e1000e' driver has
shorter, more efficient code paths while
'igb' and 'ixgbe' use significantly more
CPU to process the same number of packets.
I can quantify that if it is of interest.
At present the only place where multi-queue
NICs best four 1G NICs is at breaking-point
traffic loads where asymmetries in the traffic
can't be easily redistributed by the kernel
and the resulting hot-spots are weakest-link
breakpoints.
Please understand that I am not a curmudgeonly
Luddite.  I realize that sometimes it is
necessary to trade efficiency for scalability.
All I'm doing here is trying to quantify the
current state of affairs and make recommendations
in a commercial environment.  For the moment
all the excellent enhancements designed to
permit extreme scalability are costing too
much in efficiency to be worth using in
production.  When/if Tilera delivers their
100 core CPU in volume this state of affairs
will likely change.  I imagine both Intel
and AMD have many-core solutions in the pipe
as well, though it will be interesting to see
if Tilera has the essential patents and can
surpass the two majors in the market and the
courts.
next prev parent reply	other threads:[~2011-10-07  6:13 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-10-07  3:27 big picture UDP/IP performance question re 2.6.18 -> 2.6.32 starlight
2011-10-07  5:40 ` Eric Dumazet
2011-10-07  6:13   ` starlight [this message]
2011-10-07 18:09     ` chetan loke
     [not found]       ` <CAAsGZS4s1wTWW1j7FRUWW9jqpPUVF3Q46AMa7+njvE1ckX0Snw @mail.gmail.com>
2011-10-07 18:37         ` starlight
2011-10-07 19:27           ` chetan loke
     [not found]             ` <CAAsGZS4b2F9N3nV3TNu5xG+=2d0L0ncste4xv2vqoVFb1pOxEw @mail.gmail.com>
2011-10-07 19:41               ` starlight
2011-10-07 20:07           ` Ben Hutchings
2011-10-11 16:24   ` Chris Friesen
  -- strict thread matches above, loose matches on Subject: below --
2011-10-07  2:33 starlight
2011-10-07  2:24 starlight
2011-10-05  6:58 starlight
2011-10-05  8:53 ` Eric Dumazet
     [not found]   ` <1317804832.2473.25.camel@edumazet-HP-Compaq-6005-Pr o-SFF-PC>
2011-10-05 11:50     ` starlight
2011-10-05  6:11 starlight
2011-10-05  3:35 starlight
2011-10-03 18:02 starlight
2011-10-05  6:53 ` Eric Dumazet
2011-10-03 15:25 starlight
2011-10-03 16:16 ` Eric Dumazet
     [not found]   ` <1317658588.2442.5.camel@edumazet-HP-Compaq-6005-Pro -SFF-PC>
2011-10-03 16:28     ` starlight
2011-10-04 19:16 ` Christoph Lameter
2011-10-04 19:38   ` Joe Perches
2011-10-04 19:42     ` Christoph Lameter
2011-10-04 19:49       ` Serge Belyshev
2011-10-04 20:03         ` Christoph Lameter
2011-10-04 20:12           ` Serge Belyshev
2011-10-04 22:32             ` Con Kolivas
2011-10-04 19:45     ` starlight
2011-10-05 13:22   ` Peter Zijlstra
2011-10-05 14:26     ` Christoph Lameter
2011-10-05 15:12       ` Andi Kleen
2011-10-05 15:33       ` Peter Zijlstra
2011-10-05 15:12     ` starlight
2011-10-02  5:33 starlight
2011-10-02  7:21 ` Eric Dumazet
2011-10-02  8:03   ` Eric Dumazet
2011-10-02 14:47   ` Stephen Hemminger
2011-10-02 15:06   ` starlight
2011-10-04 19:54     ` Loke, Chetan
2011-10-01 21:13 starlight
2011-10-01 18:16 starlight
2011-10-01 18:40 ` Willy Tarreau
2011-10-01 19:11 ` Eric Dumazet
2011-10-01 19:43   ` starlight
     [not found] <6.2.5.6.2.20111001012019.05c05b80@flumedata.com>
2011-10-01  6:44 ` Eric Dumazet
2011-10-01 15:56   ` starlight
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox
  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):
  git send-email \
    --in-reply-to=6.2.5.6.2.20111007020308.039be248@binnacle.cx \
    --to=starlight@binnacle.cx \
    --cc=Chetan.Loke@netscout.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=bcrl@kvack.org \
    --cc=belyshev@depni.sinp.msu.ru \
    --cc=cl@gentwo.org \
    --cc=conman@kolivas.org \
    --cc=eric.dumazet@gmail.com \
    --cc=joe@perches.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=netdev@vger.kernel.org \
    --cc=stephen.hemminger@vyatta.com \
    --cc=w@1wt.eu \
    /path/to/YOUR_REPLY
  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
  Be sure your reply has a Subject: header at the top and a blank line
  before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).