netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Stephen Hemminger <shemminger@vyatta.com>
To: Wei Gu <wei.gu@ericsson.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>,
	Alexander Duyck <alexander.h.duyck@intel.com>,
	netdev <netdev@vger.kernel.org>,
	"Kirsher, Jeffrey T" <jeffrey.t.kirsher@intel.com>
Subject: Re: Low performance Intel 10GE NIC (3.2.10) on 2.6.38 Kernel
Date: Fri, 8 Apr 2011 07:49:02 -0700	[thread overview]
Message-ID: <20110408074902.2bd10e6b@nehalam> (raw)
In-Reply-To: <D12839161ADD3A4B8DA63D1A134D084026E48BA682@ESGSCCMS0001.eapac.ericsson.se>

On Fri, 8 Apr 2011 22:10:50 +0800
Wei Gu <wei.gu@ericsson.com> wrote:

> Hi,
> Got you mean.
> But as I decribed before, I start the eth10 with 8 rx queues and 8 tx queues, and then I binding these 8 tx&rx queue each to CPU core 24-32 (NUMA3), which I think could gain the best performance in my case (It's true on Linux 2.6.32)
> single queue ->single CPU
> Then I can descibe a little bit with packet generator, I config the IXIA to continues increase the dest ip address towards the test server, so the packet was evenly distributed to each receving queues of the eth10. And according the IXIA tools the transmit sharp was really good, no too much peaks
> 
> What I observed on Linux 2.6.38 during the test, there is no softqd was stressed (< 03% on SI for each core(24-31)) while the packet lost happens, so we are not really stress the CPU:), It looks like we are limited  on some memory bandwidth (DMA) on this release
> 
> And with same test case on 2.6.32, no such problem at all. It running pretty stable > 2Mpps without rx_missing_error. There is no HW limitation on this DL580
> 
> 
> BTW what is these "swapper"
> +      0.80%          swapper  [ixgbe]                    [k] ixgbe_poll
> +      0.79%             perf  [ixgbe]                    [k] ixgbe_poll
> Why the ixgbe_poll was on swapper/perf?
> 
> Thanks
> WeiGu
> 
> -----Original Message-----
> From: Eric Dumazet [mailto:eric.dumazet@gmail.com]
> Sent: Friday, April 08, 2011 8:57 PM
> To: Wei Gu
> Cc: Alexander Duyck; netdev; Kirsher, Jeffrey T
> Subject: RE: Low performance Intel 10GE NIC (3.2.10) on 2.6.38 Kernel
> 
> Le vendredi 08 avril 2011 à 20:19 +0800, Wei Gu a écrit :
> > Hi again,
> > I tried more testing with by disable this CONFIG_DMAR with shipped
> > 2.6.38 ixgbe and Intel released 3.2.10/3.1.15.
> > All these test looks we can get >1Mpps 400bype packtes but not stable
> > at all, there will huge number missing errors with 100% CPU IDLE:
> > ethtool -S eth10 |grep rx_missed_errors
> >
> >         rx_missed_errors: 76832040
> >
> > SUM: 1102212 ETH8: 0  ETH10: 1102212 ETH6: 0 ETH4: 0
> > SUM: 521841 ETH8: 0  ETH10: 521841 ETH6: 0 ETH4: 0
> > SUM: 426776 ETH8: 0  ETH10: 426776 ETH6: 0 ETH4: 0
> > SUM: 927520 ETH8: 0  ETH10: 927520 ETH6: 0 ETH4: 0
> > SUM: 1171995 ETH8: 0  ETH10: 1171995 ETH6: 0 ETH4: 0
> > SUM: 855980 ETH8: 0  ETH10: 855980 ETH6: 0 ETH4: 0
> >
> >
> > Do you know if there is other options in the kernel will cause high
> > rate rx_missed_errors with low CPU usage. (No problem on 2.6.32 with
> > same test case)
> >
> > perf  record:
> > +     69.74%          swapper  [kernel.kallsyms]          [k] poll_idle
> > +     11.62%          swapper  [kernel.kallsyms]          [k] intel_idle
> > +      0.80%          swapper  [ixgbe]                    [k] ixgbe_poll
> > +      0.79%             perf  [ixgbe]                    [k] ixgbe_poll
> > +      0.77%             perf  [kernel.kallsyms]          [k] skb_copy_bits
> > +      0.64%          swapper  [kernel.kallsyms]          [k] skb_copy_bits
> > +      0.48%             perf  [kernel.kallsyms]          [k] __kmalloc_node_track_caller
> > +      0.44%          swapper  [kernel.kallsyms]          [k] __kmalloc_node_track_caller
> > +      0.36%          swapper  [kernel.kallsyms]          [k] kmem_cache_alloc_node
> > +      0.35%          swapper  [kernel.kallsyms]          [k] kfree
> > +      0.35%             perf  [kernel.kallsyms]          [k] kmem_cache_alloc_node
> >
> 
> 
> Make sure enough cpus serves interrupts, _before_ even starting your stress test.
> 
> Then, make sure trafic is distributed to many different queues.
> If a single flow is used, it probably uses a single queue ->single CPU.
> 
> Say you have irq affinities set to fffffffffffff  (all cpus able to serve IRQ X,Y,Z,T,...)
> 
> Then you have a network burst (because you start your packet generator at full rate), spreaded on many queues.
> 
> CPU0 takes hard interrupt for queue 0, eth8, and queues NAPI mode.
> CPU0 takes hard interrupt for queue 0, eth10, and queues NAPI mode.
> CPU0 takes hard interrupt for queue 1, eth8, and queues NAPI mode.
> CPU0 takes hard interrupt for queue 1, eth10, and queues NAPI mode.
> CPU0 takes hard interrupt for queue 2, eth8, and queues NAPI mode.
> CPU0 takes hard interrupt for queue 2, eth10, and queues NAPI mode.
> ...
> CPU0 takes hard interrupt for queue X, eth8, and queues NAPI mode.
> ...
> 
> Then softirq can start, and only CPU0 is able to handle NAPI for all the queued devices. You are stuck, with CPU0 never leaving ksoftirqd.
> 
> NAPI handling is always performed on the CPU that received the hardware interrupt, until we exit NAPI (and rearm interrupt delivery).
> It cannot migrate to an "idle cpu"

For performance, you need to assign each network interrupt to a single
CPU. There is no load balancing effect in the IRQ controller.

If you have a multi-socket system, then it is a good idea to make the IRQ's
for the NIC's be on the same socket as the bus interface. Multi socket systems
are really NUMA and putting IRQ on non-local CPU has measurable impact.



-- 

  reply	other threads:[~2011-04-08 14:49 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <D12839161ADD3A4B8DA63D1A134D084026E48B9BEB@ESGSCCMS0001.eapac.ericsson.se>
2011-04-07  4:58 ` Question on "net: allocate skbs on local node" Eric Dumazet
2011-04-07  5:16   ` Eric Dumazet
2011-04-07  6:16     ` Eric Dumazet
2011-04-07  7:22       ` Low performance Intel 10GE NIC (3.2.10) on 2.6.38 Kernel Wei Gu
2011-04-07  8:07         ` Eric Dumazet
2011-04-07  8:39           ` Wei Gu
2011-04-07  9:06             ` Eric Dumazet
2011-04-07 11:15               ` Wei Gu
2011-04-07 11:46                 ` Eric Dumazet
2011-04-07 13:41                   ` Eric Dumazet
2011-04-07 15:58                   ` Alexander Duyck
2011-04-07 16:03                     ` Eric Dumazet
2011-04-07 16:20                       ` Alexander Duyck
2011-04-07 16:37                         ` Eric Dumazet
2011-04-08  8:59                         ` Wei Gu
2011-04-08  9:07                           ` Eric Dumazet
2011-04-08  9:15                             ` Wei Gu
2011-04-08  9:49                               ` Eric Dumazet
2011-04-08  9:59                                 ` Wei Gu
2011-04-08  9:41                             ` Wei Gu
2011-04-08 12:19                             ` Wei Gu
2011-04-08 12:56                               ` Eric Dumazet
2011-04-08 14:10                                 ` Wei Gu
2011-04-08 14:49                                   ` Stephen Hemminger [this message]
2011-04-09  3:51                                     ` Wei Gu
2011-04-08 15:07                                   ` Eric Dumazet
2011-04-09  3:27                                     ` Wei Gu
2011-04-09  6:36                                       ` Eric Dumazet
2011-04-10  7:02                                         ` Wei Gu
2011-04-11 14:50                                           ` Alexander Duyck
2011-04-11 15:00                                             ` Wei Gu
2011-04-11 15:14                                             ` Wei Gu
2011-04-11 15:42                                               ` Eric Dumazet
2011-04-12  1:22                                                 ` Wei Gu
2011-04-12  4:40                                                 ` Wei Gu
2011-04-12  4:56                                                   ` Eric Dumazet
2011-04-12  5:18                                                     ` Wei Gu
2011-04-14  5:42                                                 ` Wei Gu
2011-04-14  6:07                                                   ` Eric Dumazet
2011-04-14  6:33                                                     ` Eric Dumazet
2011-04-14  6:58                                                       ` Wei Gu
2011-04-14 16:42                                                         ` Alexander Duyck
2011-04-14 16:45                                                           ` Eric Dumazet
2011-04-14 16:56                                                           ` Peter Zijlstra
2011-04-14 16:57                                                             ` Eric Dumazet
2011-04-14 17:49                                                               ` Eric Dumazet
2011-04-14 19:08                                                                 ` Alexander Duyck
2011-04-15  2:10                                                               ` Wei Gu
2011-04-15  8:57                                                               ` Peter Zijlstra
2011-04-15  9:14                                                                 ` Wei Gu
2011-04-18 21:12                                                                   ` Jesse Brandeburg
2011-04-19  4:09                                                                     ` Wei Gu
2011-04-21  2:57                                                                     ` Wei Gu
2011-04-21  3:25                                                                     ` Wei Gu
2011-04-08 16:22                               ` Alexander Duyck
2011-04-09  3:36                                 ` Wei Gu
2011-04-09  4:40                                   ` Alexander H Duyck
2011-04-09  6:12                                     ` Wei Gu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110408074902.2bd10e6b@nehalam \
    --to=shemminger@vyatta.com \
    --cc=alexander.h.duyck@intel.com \
    --cc=eric.dumazet@gmail.com \
    --cc=jeffrey.t.kirsher@intel.com \
    --cc=netdev@vger.kernel.org \
    --cc=wei.gu@ericsson.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).