public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Eric Dumazet <eric.dumazet@gmail.com>
To: starlight@binnacle.cx
Cc: linux-kernel@vger.kernel.org, netdev <netdev@vger.kernel.org>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Christoph Lameter <cl@gentwo.org>, Willy Tarreau <w@1wt.eu>,
	Ingo Molnar <mingo@elte.hu>,
	Stephen Hemminger <stephen.hemminger@vyatta.com>,
	Benjamin LaHaise <bcrl@kvack.org>, Joe Perches <joe@perches.com>,
	Chetan Loke <Chetan.Loke@netscout.com>,
	Con Kolivas <conman@kolivas.org>,
	Serge Belyshev <belyshev@depni.sinp.msu.ru>
Subject: Re: big picture UDP/IP performance question re 2.6.18  -> 2.6.32
Date: Fri, 07 Oct 2011 07:40:07 +0200	[thread overview]
Message-ID: <1317966007.3457.47.camel@edumazet-laptop> (raw)
In-Reply-To: <6.2.5.6.2.20111006231958.039bb570@binnacle.cx>

Le jeudi 06 octobre 2011 à 23:27 -0400, starlight@binnacle.cx a écrit :
> After writing the last post, the large
> difference in IRQ rate between the older
> and newer kernels caught my eye.
> 
> I wonder if the hugely lower rate in the older
> kernels reflects a more agile shifting
> into and out of NAPI mode by the network
> bottom-half.
> 
> In this test the sending system
> pulses data out on millisecond boundaries
> due to the behavior of nsleep(), which
> is used to establish the playback pace.
> 
> If the older kernels are switching to NAPI
> for much of surge and the switching out
> once the pulse falls off, it might
> conceivably result in much better latency
> and overall performance.
> 
> All tests were run with Intel 82571 
> network interfaces and the 'e1000e'
> device driver.  Some used the driver
> packaged with the kernel, some used
> Intel driver compiled from the source
> found on sourceforge.net.  Never could
> detected any difference between the two.
> 
> Since data in the production environment
> also tends to arrive in bursts, I don't find
> the pulsing playback behavior a detriment.
> 

Thats exactly the opposite : Your old kernel is not fast enough to
enter/exit NAPI on every incoming frame.

Instead of one IRQ per incoming frame, you have less interrupts :
A napi run processes more than 1 frame.

Now increase your incoming rate, and you'll discover a new kernel will
be able to process more frames without losses.

About your thread model :

You have one thread that reads the incoming frame, and do a distribution
on several queues based on some flow parameters. Then you wakeup a
second thread.

This kind of model is very expensive and triggers lot of false sharing.

New kernels are able to perform this fanout in kernel land.

You really should take a look at Documentation/networking/scaling.txt

[ An other way of doing this fanout is using some iptables rules :
check following commit changelog for an idea ]

commit e8648a1fdb54da1f683784b36a17aa65ea56e931
Author: Eric Dumazet <eric.dumazet@gmail.com>
Date:   Fri Jul 23 12:59:36 2010 +0200

    netfilter: add xt_cpu match
    
    In some situations a CPU match permits a better spreading of
    connections, or select targets only for a given cpu.
    
    With Remote Packet Steering or multiqueue NIC and appropriate IRQ
    affinities, we can distribute trafic on available cpus, per session.
    (all RX packets for a given flow is handled by a given cpu)
    
    Some legacy applications being not SMP friendly, one way to scale a
    server is to run multiple copies of them.
    
    Instead of randomly choosing an instance, we can use the cpu number as a
    key so that softirq handler for a whole instance is running on a single
    cpu, maximizing cache effects in TCP/UDP stacks.
    
    Using NAT for example, a four ways machine might run four copies of
    server application, using a separate listening port for each instance,
    but still presenting an unique external port :
    
    iptables -t nat -A PREROUTING -p tcp --dport 80 -m cpu --cpu 0 \
            -j REDIRECT --to-port 8080
    
    iptables -t nat -A PREROUTING -p tcp --dport 80 -m cpu --cpu 1 \
            -j REDIRECT --to-port 8081
    
    iptables -t nat -A PREROUTING -p tcp --dport 80 -m cpu --cpu 2 \
            -j REDIRECT --to-port 8082
    
    iptables -t nat -A PREROUTING -p tcp --dport 80 -m cpu --cpu 3 \
            -j REDIRECT --to-port 8083
    

  reply	other threads:[~2011-10-07  5:40 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-10-07  3:27 big picture UDP/IP performance question re 2.6.18 -> 2.6.32 starlight
2011-10-07  5:40 ` Eric Dumazet [this message]
2011-10-07  6:13   ` starlight
2011-10-07 18:09     ` chetan loke
     [not found]       ` <CAAsGZS4s1wTWW1j7FRUWW9jqpPUVF3Q46AMa7+njvE1ckX0Snw @mail.gmail.com>
2011-10-07 18:37         ` starlight
2011-10-07 19:27           ` chetan loke
     [not found]             ` <CAAsGZS4b2F9N3nV3TNu5xG+=2d0L0ncste4xv2vqoVFb1pOxEw @mail.gmail.com>
2011-10-07 19:41               ` starlight
2011-10-07 20:07           ` Ben Hutchings
2011-10-11 16:24   ` Chris Friesen
  -- strict thread matches above, loose matches on Subject: below --
2011-10-07  2:33 starlight
2011-10-07  2:24 starlight
2011-10-05  6:58 starlight
2011-10-05  8:53 ` Eric Dumazet
     [not found]   ` <1317804832.2473.25.camel@edumazet-HP-Compaq-6005-Pr o-SFF-PC>
2011-10-05 11:50     ` starlight
2011-10-05  6:11 starlight
2011-10-05  3:35 starlight
2011-10-03 18:02 starlight
2011-10-05  6:53 ` Eric Dumazet
2011-10-03 15:25 starlight
2011-10-03 16:16 ` Eric Dumazet
     [not found]   ` <1317658588.2442.5.camel@edumazet-HP-Compaq-6005-Pro -SFF-PC>
2011-10-03 16:28     ` starlight
2011-10-04 19:16 ` Christoph Lameter
2011-10-04 19:38   ` Joe Perches
2011-10-04 19:42     ` Christoph Lameter
2011-10-04 19:49       ` Serge Belyshev
2011-10-04 20:03         ` Christoph Lameter
2011-10-04 20:12           ` Serge Belyshev
2011-10-04 22:32             ` Con Kolivas
2011-10-04 19:45     ` starlight
2011-10-05 13:22   ` Peter Zijlstra
2011-10-05 14:26     ` Christoph Lameter
2011-10-05 15:12       ` Andi Kleen
2011-10-05 15:33       ` Peter Zijlstra
2011-10-05 15:12     ` starlight
2011-10-02  5:33 starlight
2011-10-02  7:21 ` Eric Dumazet
2011-10-02  8:03   ` Eric Dumazet
2011-10-02 14:47   ` Stephen Hemminger
2011-10-02 15:06   ` starlight
2011-10-04 19:54     ` Loke, Chetan
2011-10-01 21:13 starlight
2011-10-01 18:16 starlight
2011-10-01 18:40 ` Willy Tarreau
2011-10-01 19:11 ` Eric Dumazet
2011-10-01 19:43   ` starlight
     [not found] <6.2.5.6.2.20111001012019.05c05b80@flumedata.com>
2011-10-01  6:44 ` Eric Dumazet
2011-10-01 15:56   ` starlight

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1317966007.3457.47.camel@edumazet-laptop \
    --to=eric.dumazet@gmail.com \
    --cc=Chetan.Loke@netscout.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=bcrl@kvack.org \
    --cc=belyshev@depni.sinp.msu.ru \
    --cc=cl@gentwo.org \
    --cc=conman@kolivas.org \
    --cc=joe@perches.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=netdev@vger.kernel.org \
    --cc=starlight@binnacle.cx \
    --cc=stephen.hemminger@vyatta.com \
    --cc=w@1wt.eu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox