From mboxrd@z Thu Jan 1 00:00:00 1970 From: Kenny Chang Subject: Re: Multicast packet loss Date: Wed, 04 Feb 2009 11:07:13 -0500 Message-ID: <4989BD31.306@athenacr.com> References: <49833DBC.7040607@athenacr.com> <20090130200330.GA12659@hmsreliant.think-freely.org> <49837F56.2020502@athenacr.com> <49838213.90700@cosmosbay.com> <20090131160333.GC23100@localhost.localdomain> <498723D9.5020509@athenacr.com> <20090203115502.GB28117@hmsreliant.think-freely.org> <498860AD.5010702@athenacr.com> <20090204011541.GB3650@localhost.localdomain> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE To: netdev@vger.kernel.org Return-path: Received: from [64.95.46.209] ([64.95.46.209]:1067 "EHLO sprinkles.inp.in.athenacr.com" rhost-flags-FAIL-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1753926AbZBDQHU (ORCPT ); Wed, 4 Feb 2009 11:07:20 -0500 In-Reply-To: <20090204011541.GB3650@localhost.localdomain> Sender: netdev-owner@vger.kernel.org List-ID: Neil Horman wrote: > On Tue, Feb 03, 2009 at 10:20:13AM -0500, Kenny Chang wrote: > =20 >> Neil Horman wrote: >> =20 >>> On Mon, Feb 02, 2009 at 11:48:25AM -0500, Kenny Chang wrote: >>> =20 >>> =20 >>>> Neil Horman wrote: >>>> =20 >>>> =20 >>>>> On Fri, Jan 30, 2009 at 11:41:23PM +0100, Eric Dumazet wrote: >>>>> =20 >>>>> =20 >>>>>> Kenny Chang a =E9crit : >>>>>> =20 >>>>>> =20 >>>>>>> Ah, sorry, here's the test program attached. >>>>>>> >>>>>>> We've tried 2.6.28.1, but no, we haven't tried the 2.6.28.2 or = the >>>>>>> 2.6.29.-rcX. >>>>>>> >>>>>>> Right now, we are trying to step through the kernel versions un= til we >>>>>>> see where the performance drops significantly. We'll try 2.6.2= 9-rc soon >>>>>>> and post the result. >>>>>>> =20 >>>>>>> =20 >>>>>> 2.6.29-rc contains UDP receive improvements (lockless) >>>>>> >>>>>> Problem is multicast handling was not yet updated, but could be = :) >>>>>> >>>>>> >>>>>> I was asking you "cat /proc/interrupts" because I believe you mi= ght >>>>>> have a problem NIC interrupts being handled by one CPU only (whe= n having problems) >>>>>> >>>>>> =20 >>>>>> =20 >>>>> That would be expected (if irqbalance is running), and desireable= , since >>>>> spreading high volume interrupts like NICS accross multiple cores= (or more >>>>> specifically multiple L2 caches), is going increase your cache li= ne miss rate >>>>> significantly and decrease rx throughput. >>>>> >>>>> Although you do have a point here, if the system isn't running ir= qbalance, and >>>>> the NICS irq affinity is spread accross multiple L2 caches, that = would be a >>>>> point of improvement performance-wise. =20 >>>>> >>>>> Kenny, if you could provide the /proc/interrupts info along with = /proc/cpuinfo >>>>> and your stats that I asked about earlier, that would be a big he= lp. >>>>> >>>>> Regards >>>>> Neil >>>>> >>>>> =20 >>>>> =20 >>>> This is for a working setup. >>>> >>>> =20 >>>> =20 >>> Are these quad core systems? Or dual core w/ hyperthreading? I as= k because in >>> your working setup you have 1/2 the number of cpus' and was not sur= e if you >>> removed an entire package of if you just disabled hyperthreading. >>> >>> >>> Neil >>> >>> =20 >>> =20 >> Yeah, these are quad core systems. The 8 cpu system is a dual-proce= ssor =20 >> quad-core. The other is my desktop, single cpu quad core. >> >> =20 > Ok, so their separate systms then. Did you actually experience drops= on the > 8-core system since the last reboot? I ask because even when its dis= tributed > across all 8 cores, you only have about 500 total interrupts from the= NIC, and > if you did get drops, something more than just affinity is wrong. > > Regardless, spreading interrupts across cores is definately a problem= =2E As eric > says, quad core chips are actually 2x2 cores, so you'll want to eithe= r just run > irqbalance to assign an apropriate affinity to the NIC, or manually l= ook at each > cores physical id and sibling id, to assign affininty to a core or co= res that > share an L2 cache. If you need to, as you've found, you may need to = disable msi > interrupt mode on your bnx2 driver. That kinda stinks, but bnx2 IIRC= isn't > multiqueue, so its not like msi provides you any real performance gai= n. > > Neil > > =20 Hi Neil, Yeah, we've been rebooting this system left and right switch kernels. =20 The results are fairly consistent. We were able to set the irq=20 affinities, and as Wes had mentioned, what we see is that if we pin the= =20 softirq to 1 core, and pin the app to its sibling, we see really good=20 performance, but as we load up other cores, the machine reaches a=20 breaking point where all hell breaks loose and we drop a bunch. (we=20 hadn't turned off msi btw..) While we were able to tune and adjust performance like that, in the end= ,=20 it doesn't really explain the difference between earlier and recent=20 kernels, also it doesn't quite explain the difference between machines. You mentioned it would be good to see the interrupts for each kernel, i= n=20 light of the above information, would it still be useful for me to=20 provide that? Kenny