From: Eric Dumazet <eric.dumazet@gmail.com>
To: starlight@binnacle.cx
Cc: linux-kernel@vger.kernel.org, netdev <netdev@vger.kernel.org>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
Christoph Lameter <cl@gentwo.org>, Willy Tarreau <w@1wt.eu>,
Ingo Molnar <mingo@elte.hu>,
Stephen Hemminger <stephen.hemminger@vyatta.com>,
Benjamin LaHaise <bcrl@kvack.org>, Joe Perches <joe@perches.com>,
Chetan Loke <Chetan.Loke@netscout.com>,
Con Kolivas <conman@kolivas.org>,
Serge Belyshev <belyshev@depni.sinp.msu.ru>
Subject: Re: big picture UDP/IP performance question re 2.6.18 -> 2.6.32
Date: Fri, 07 Oct 2011 07:40:07 +0200 [thread overview]
Message-ID: <1317966007.3457.47.camel@edumazet-laptop> (raw)
In-Reply-To: <6.2.5.6.2.20111006231958.039bb570@binnacle.cx>
Le jeudi 06 octobre 2011 à 23:27 -0400, starlight@binnacle.cx a écrit :
> After writing the last post, the large
> difference in IRQ rate between the older
> and newer kernels caught my eye.
>
> I wonder if the hugely lower rate in the older
> kernels reflects a more agile shifting
> into and out of NAPI mode by the network
> bottom-half.
>
> In this test the sending system
> pulses data out on millisecond boundaries
> due to the behavior of nsleep(), which
> is used to establish the playback pace.
>
> If the older kernels are switching to NAPI
> for much of surge and the switching out
> once the pulse falls off, it might
> conceivably result in much better latency
> and overall performance.
>
> All tests were run with Intel 82571
> network interfaces and the 'e1000e'
> device driver. Some used the driver
> packaged with the kernel, some used
> Intel driver compiled from the source
> found on sourceforge.net. Never could
> detected any difference between the two.
>
> Since data in the production environment
> also tends to arrive in bursts, I don't find
> the pulsing playback behavior a detriment.
>
Thats exactly the opposite : Your old kernel is not fast enough to
enter/exit NAPI on every incoming frame.
Instead of one IRQ per incoming frame, you have less interrupts :
A napi run processes more than 1 frame.
Now increase your incoming rate, and you'll discover a new kernel will
be able to process more frames without losses.
About your thread model :
You have one thread that reads the incoming frame, and do a distribution
on several queues based on some flow parameters. Then you wakeup a
second thread.
This kind of model is very expensive and triggers lot of false sharing.
New kernels are able to perform this fanout in kernel land.
You really should take a look at Documentation/networking/scaling.txt
[ An other way of doing this fanout is using some iptables rules :
check following commit changelog for an idea ]
commit e8648a1fdb54da1f683784b36a17aa65ea56e931
Author: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri Jul 23 12:59:36 2010 +0200
netfilter: add xt_cpu match
In some situations a CPU match permits a better spreading of
connections, or select targets only for a given cpu.
With Remote Packet Steering or multiqueue NIC and appropriate IRQ
affinities, we can distribute trafic on available cpus, per session.
(all RX packets for a given flow is handled by a given cpu)
Some legacy applications being not SMP friendly, one way to scale a
server is to run multiple copies of them.
Instead of randomly choosing an instance, we can use the cpu number as a
key so that softirq handler for a whole instance is running on a single
cpu, maximizing cache effects in TCP/UDP stacks.
Using NAT for example, a four ways machine might run four copies of
server application, using a separate listening port for each instance,
but still presenting an unique external port :
iptables -t nat -A PREROUTING -p tcp --dport 80 -m cpu --cpu 0 \
-j REDIRECT --to-port 8080
iptables -t nat -A PREROUTING -p tcp --dport 80 -m cpu --cpu 1 \
-j REDIRECT --to-port 8081
iptables -t nat -A PREROUTING -p tcp --dport 80 -m cpu --cpu 2 \
-j REDIRECT --to-port 8082
iptables -t nat -A PREROUTING -p tcp --dport 80 -m cpu --cpu 3 \
-j REDIRECT --to-port 8083
next prev parent reply other threads:[~2011-10-07 5:40 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-10-07 3:27 big picture UDP/IP performance question re 2.6.18 -> 2.6.32 starlight
2011-10-07 5:40 ` Eric Dumazet [this message]
2011-10-07 6:13 ` starlight
2011-10-07 18:09 ` chetan loke
[not found] ` <CAAsGZS4s1wTWW1j7FRUWW9jqpPUVF3Q46AMa7+njvE1ckX0Snw @mail.gmail.com>
2011-10-07 18:37 ` starlight
2011-10-07 19:27 ` chetan loke
[not found] ` <CAAsGZS4b2F9N3nV3TNu5xG+=2d0L0ncste4xv2vqoVFb1pOxEw @mail.gmail.com>
2011-10-07 19:41 ` starlight
2011-10-07 20:07 ` Ben Hutchings
2011-10-11 16:24 ` Chris Friesen
-- strict thread matches above, loose matches on Subject: below --
2011-10-07 2:33 starlight
2011-10-07 2:24 starlight
2011-10-05 6:58 starlight
2011-10-05 8:53 ` Eric Dumazet
[not found] ` <1317804832.2473.25.camel@edumazet-HP-Compaq-6005-Pr o-SFF-PC>
2011-10-05 11:50 ` starlight
2011-10-05 6:11 starlight
2011-10-05 3:35 starlight
2011-10-03 18:02 starlight
2011-10-05 6:53 ` Eric Dumazet
2011-10-03 15:25 starlight
2011-10-03 16:16 ` Eric Dumazet
[not found] ` <1317658588.2442.5.camel@edumazet-HP-Compaq-6005-Pro -SFF-PC>
2011-10-03 16:28 ` starlight
2011-10-04 19:16 ` Christoph Lameter
2011-10-04 19:38 ` Joe Perches
2011-10-04 19:42 ` Christoph Lameter
2011-10-04 19:49 ` Serge Belyshev
2011-10-04 20:03 ` Christoph Lameter
2011-10-04 20:12 ` Serge Belyshev
2011-10-04 22:32 ` Con Kolivas
2011-10-04 19:45 ` starlight
2011-10-05 13:22 ` Peter Zijlstra
2011-10-05 14:26 ` Christoph Lameter
2011-10-05 15:12 ` Andi Kleen
2011-10-05 15:33 ` Peter Zijlstra
2011-10-05 15:12 ` starlight
2011-10-02 5:33 starlight
2011-10-02 7:21 ` Eric Dumazet
2011-10-02 8:03 ` Eric Dumazet
2011-10-02 14:47 ` Stephen Hemminger
2011-10-02 15:06 ` starlight
2011-10-04 19:54 ` Loke, Chetan
2011-10-01 21:13 starlight
2011-10-01 18:16 starlight
2011-10-01 18:40 ` Willy Tarreau
2011-10-01 19:11 ` Eric Dumazet
2011-10-01 19:43 ` starlight
[not found] <6.2.5.6.2.20111001012019.05c05b80@flumedata.com>
2011-10-01 6:44 ` Eric Dumazet
2011-10-01 15:56 ` starlight
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1317966007.3457.47.camel@edumazet-laptop \
--to=eric.dumazet@gmail.com \
--cc=Chetan.Loke@netscout.com \
--cc=a.p.zijlstra@chello.nl \
--cc=bcrl@kvack.org \
--cc=belyshev@depni.sinp.msu.ru \
--cc=cl@gentwo.org \
--cc=conman@kolivas.org \
--cc=joe@perches.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=netdev@vger.kernel.org \
--cc=starlight@binnacle.cx \
--cc=stephen.hemminger@vyatta.com \
--cc=w@1wt.eu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox