From mboxrd@z Thu Jan 1 00:00:00 1970 From: Chris Snook Subject: Re: Bad network performance over 2Gbps Date: Tue, 15 Apr 2008 16:59:32 -0400 Message-ID: <48051734.1000107@redhat.com> References: <1208282804.23631.27.camel@localhost> <175f5a0f0804151315x1e192fc7p7dac1e84fd154211@mail.gmail.com> <48051173.5030802@intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: "H. Willstrand" , Anton Titov , netdev@vger.kernel.org, Jesse Brandeburg To: "Kok, Auke" Return-path: Received: from mx1.redhat.com ([66.187.233.31]:52316 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1765195AbYDOU7i (ORCPT ); Tue, 15 Apr 2008 16:59:38 -0400 In-Reply-To: <48051173.5030802@intel.com> Sender: netdev-owner@vger.kernel.org List-ID: Kok, Auke wrote: > H. Willstrand wrote: >> [Changed mail list] >> >> On Tue, Apr 15, 2008 at 8:06 PM, Anton Titov wrote: >>> I use Linux for serving a huge amount of static web on few servers. When >>> network traffic goes above 2Gbit/sec ksoftirqd/5 (not every time 5, but >>> every time just one) starts using exactly 100% CPU time and packet >>> packet loss starts preventing traffic from going up. When the network >>> traffic is lower than 1.9Gbit ksoftirqds use 0% CPU according to top. >>> >>> Uplink is 6 gigabit Intel cards bonded together using 802.3ad algorithm >>> with xmit_hash_policy set to layer3+4. On the other side is Cisco 2960 >>> switch. Machine is with two quad core Intel Xeons @2.33GHz. >>> >>> Here goes a screen snapshot of "top" command. The described behavior >>> have nothing to do with 13% io-wait. It happens even if it is 0% >>> io-wait. >>> http://www.titov.net/misc/top-snap.png >>> >>> kernel configuration: >>> http://www.titov.net/misc/config.gz >>> >>> /proc/interrupts, lspci, dmesg (nothing intresting there), ifconfig, >>> uname -a: >>> http://www.titov.net/misc/misc.txt.gz >>> >>> Is it a Linux bug or some hardware limitation? > > I'm wondering if this is not a classical demonstration of the NAPI-irq trap where > after migration all the interrupts from the various cards are migrated to a single > CPU, and because of NAPI once they're busy polling won't ever migrate away from > that CPU again. > > Have you looked at `cat /proc/interrupts` before and after this happens? > > My guess is that your specific situation can benefit from setting smp_affinity and > forcing the NIC irq's so that you're at least occupying the load over multiple > CPU's (but preferably ones that use the same cache!) will help relieve the situation. > > alternatively you might even see an improvement by disabling NAPI. depending on > the driver that you're using this might be possible. > > I actually don't know much about bonding and how this affects everything, but my > guess is that that's a less important factor in this issue. > > Cheers, > > Auke I'm not sure that spreading IRQs out completely is necessarily a good idea, due to cache line ping-pong. I suspect you'll get optimal performance by assigning the six IRQs to two cores that share an L2 cache. Still, I think you're on to something here. Disabling NAPI and instead tuning the cards' interrupt coalescing settings might allow irqbalance to do a better job than it is currently. -- Chris