From mboxrd@z Thu Jan 1 00:00:00 1970 From: Eric Dumazet Subject: Re: Multicast packet loss Date: Mon, 02 Feb 2009 22:31:41 +0100 Message-ID: <4987663D.6080802@cosmosbay.com> References: <49833DBC.7040607@athenacr.com> <20090130200330.GA12659@hmsreliant.think-freely.org> <49837F56.2020502@athenacr.com> <49838213.90700@cosmosbay.com> <49859847.9010206@cosmosbay.com> <20090202134523.GA13369@hmsreliant.think-freely.org> <498725F4.2010205@cosmosbay.com> <20090202182212.GA17950@hmsreliant.think-freely.org> <498757AA.8010101@cosmosbay.com> <4987610D.6040902@athenacr.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: netdev@vger.kernel.org To: Wes Chow Return-path: Received: from gw1.cosmosbay.com ([212.99.114.194]:60389 "EHLO gw1.cosmosbay.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756727AbZBBVbv convert rfc822-to-8bit (ORCPT ); Mon, 2 Feb 2009 16:31:51 -0500 In-Reply-To: <4987610D.6040902@athenacr.com> Sender: netdev-owner@vger.kernel.org List-ID: Wes Chow a =E9crit : >=20 >=20 > Eric Dumazet wrote: >> Wes Chow a =E9crit : >>> (I'm Kenny's colleague, and I've been doing the kernel builds) >>> >>> First I'd like to note that there were a lot of bnx2 NAPI changes >>> between 2.6.21 and 2.6.22. As a reminder, 2.6.21 shows tiny amounts >>> of packet loss, >>> whereas loss in 2.6.22 is significant. >>> >>> Second, some CPU affinity info: if I do like Eric and pin all of th= e >>> apps onto a single CPU, I see no packet loss. Also, I do *not* see >>> ksoftirqd show up on top at all! >>> >>> If I pin half the processes on one CPU and the other half on anothe= r >>> CPU, one ksoftirqd processes shows up in top and completely pegs on= e >>> CPU. My packet loss >>> in that case is significant (25%). >>> >>> Now, the strange case: if I pin 3 processes to one CPU and 1 proces= s >>> to another, I get about 25% packet loss and ksoftirqd pins one CPU. >>> However, one >>> of the apps takes significantly less CPU than the others, and all >>> apps lose the >>> *exact same number of packets*. In all other situations where we se= e >>> packet >>> loss, the actual number lost per application instance appears rando= m. >> >> You see same number of packet lost because they are lost at NIC leve= l >=20 > Understood. >=20 > I have a new observation: if I pin processes to just CPUs 0 and 1, I = see > no packet loss. Pinning to 0 and 2, I do see packet loss. Pinning 2 a= nd > 3, no packet loss. 4 & 5 - no packet loss, 6 & 7 - no packet loss. An= y > other combination appears to produce loss (though I have not tried al= l > 28 combinations, this seems to be the case). >=20 > At first I thought maybe it had to do with processes pinned to the sa= me > CPU, but different cores. The machine is a dual quad core, which mean= s > that CPUs 0-3 should be a physical CPU, correct? Pinning to 0/2 and 0= /3 > produce packet loss. a quad core is really a 2 x 2 core L2 cache is splited on two blocks, one block used by CPU0/1, other by C= PU2/3=20 You are at the limit of the machine with such workload, so as soon as y= our CPUs have to transfert 64 bytes lines between those two L2 blocks, you = loose. >=20 > I've also noticed that it does not matter which of the working pairs = I > pin to. For example, pinning 5 processes in any combination on either > 0/1 produce no packet loss, pinning all 5 to just CPU 0 also produces= no > packet loss. >=20 > The failures are also sudden. In all of the working cases mentioned > above, I don't see ksoftirqd on top at all. But when I run 6 processe= s > on a single CPU, ksoftirqd shoots up to 100% and I lose a huge number= of > packets. >=20 >> >> Normaly, softirq runs on same cpu (the one handling hard irq) >=20 > What determines which CPU the hard irq occurs on? >=20 Check /proc/irq/{irqnumber}/smp_affinity If you want IRQ16 only served by CPU0 : echo 1 >/proc/irq/16/smp_affinity