netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Kok, Auke" <auke-jan.h.kok@intel.com>
To: Chris Snook <csnook@redhat.com>
Cc: "H. Willstrand" <h.willstrand@gmail.com>,
	Anton Titov <a.titov@host.bg>,
	netdev@vger.kernel.org,
	Jesse Brandeburg <jesse.brandeburg@intel.com>
Subject: Re: Bad network performance over 2Gbps
Date: Tue, 15 Apr 2008 14:05:43 -0700	[thread overview]
Message-ID: <480518A7.50703@intel.com> (raw)
In-Reply-To: <48051734.1000107@redhat.com>

Chris Snook wrote:
> Kok, Auke wrote:
>> H. Willstrand wrote:
>>> [Changed mail list]
>>>
>>> On Tue, Apr 15, 2008 at 8:06 PM, Anton Titov <a.titov@host.bg> wrote:
>>>> I use Linux for serving a huge amount of static web on few servers.
>>>> When
>>>>  network traffic goes above 2Gbit/sec ksoftirqd/5 (not every time 5,
>>>> but
>>>>  every time just one) starts using exactly 100% CPU time and packet
>>>>  packet loss starts preventing traffic from going up. When the network
>>>>  traffic is lower than 1.9Gbit ksoftirqds use 0% CPU according to top.
>>>>
>>>>  Uplink is 6 gigabit Intel cards bonded together using 802.3ad
>>>> algorithm
>>>>  with xmit_hash_policy set to layer3+4. On the other side is Cisco 2960
>>>>  switch. Machine is with two quad core Intel Xeons @2.33GHz.
>>>>
>>>>  Here goes a screen snapshot of "top" command. The described behavior
>>>>  have nothing to do with 13% io-wait. It happens even if it is 0%
>>>>  io-wait.
>>>>  http://www.titov.net/misc/top-snap.png
>>>>
>>>>  kernel configuration:
>>>>  http://www.titov.net/misc/config.gz
>>>>
>>>>  /proc/interrupts, lspci, dmesg (nothing intresting there), ifconfig,
>>>>  uname -a:
>>>>  http://www.titov.net/misc/misc.txt.gz
>>>>
>>>>  Is it a Linux bug or some hardware limitation?
>>
>> I'm wondering if this is not a classical demonstration of the NAPI-irq
>> trap where
>> after migration all the interrupts from the various cards are migrated
>> to a single
>> CPU, and because of NAPI once they're busy polling won't ever migrate
>> away from
>> that CPU again.
>>
>> Have you looked at `cat /proc/interrupts` before and after this happens?
>>
>> My guess is that your specific situation can benefit from setting
>> smp_affinity and
>> forcing the NIC irq's so that you're at least occupying the load over
>> multiple
>> CPU's (but preferably ones that use the same cache!) will help relieve
>> the situation.
>>
>> alternatively you might even see an improvement by disabling NAPI.
>> depending on
>> the driver that you're using this might be possible.
>>
>> I actually don't know much about bonding and how this affects
>> everything, but my
>> guess is that that's a less important factor in this issue.
>>
>> Cheers,
>>
>> Auke
> 
> I'm not sure that spreading IRQs out completely is necessarily a good
> idea, due to cache line ping-pong.  I suspect you'll get optimal
> performance by assigning the six IRQs to two cores that share an L2 cache.
> 
> Still, I think you're on to something here.  Disabling NAPI and instead
> tuning the cards' interrupt coalescing settings might allow irqbalance
> to do a better job than it is currently.

well I posted another reply to him after I looked at the debug output he posted
and it appeats that the in-kernel irqbalance is the culprit and the 100% softirqd
is because his interrupts are being balanced across all cores, therefore pretty
much guaranteeing him full cache misses on every single receive, not to mention
unneeded migration of tasks.

I definately think we should disable the in-kernel irqbalance option by default :)

as to what the best solution is with 6 adapters is not clear as they're all on the
same bridge, but hard setting the affinity to two cores that share an L2 seems to
give the best results quickly.

Cheers,

Auke

  reply	other threads:[~2008-04-15 21:08 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1208282804.23631.27.camel@localhost>
2008-04-15 20:15 ` Bad network performance over 2Gbps H. Willstrand
2008-04-15 20:34   ` Kok, Auke
2008-04-15 20:59     ` Chris Snook
2008-04-15 21:05       ` Kok, Auke [this message]
2008-04-17 10:02       ` Anton Titov
2008-04-17 17:37         ` [PATCH] " Kok, Auke
2008-04-20 12:08           ` Denys Fedoryshchenko
2008-04-21 13:19           ` Pavel Machek
2008-04-21 16:38             ` Kok, Auke
2008-04-21 15:28           ` Ingo Molnar
2008-04-21 16:58             ` Kok, Auke
2008-04-21 18:35               ` Andi Kleen
2008-04-22  5:07           ` Bill Fink

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=480518A7.50703@intel.com \
    --to=auke-jan.h.kok@intel.com \
    --cc=a.titov@host.bg \
    --cc=csnook@redhat.com \
    --cc=h.willstrand@gmail.com \
    --cc=jesse.brandeburg@intel.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).