From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Dickinson Subject: Re: receive-side performance issue (ixgbe, core-i7, softirq cpu%) Date: Fri, 29 Jan 2010 00:02:19 -0800 Message-ID: <606676311001290002n60e58a4bp778dea6df7ae08ab@mail.gmail.com> References: <606676311001280023j77b8b96aj556706f3e49bcc13@mail.gmail.com> <606676311001282206q113f6bbbq776996b67fd18adb@mail.gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: "netdev@vger.kernel.org" To: "Brandeburg, Jesse" Return-path: Received: from mail-pz0-f189.google.com ([209.85.222.189]:40451 "EHLO mail-pz0-f189.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751786Ab0A2ICW convert rfc822-to-8bit (ORCPT ); Fri, 29 Jan 2010 03:02:22 -0500 Received: by pzk27 with SMTP id 27so1308682pzk.33 for ; Fri, 29 Jan 2010 00:02:20 -0800 (PST) In-Reply-To: <606676311001282206q113f6bbbq776996b67fd18adb@mail.gmail.com> Sender: netdev-owner@vger.kernel.org List-ID: I might have mis-spoken about HPET. The 4.6Mpps is with 2.6.32.4 vanilla, HPET on. Either way, I'm happy now ;-P -A On Thu, Jan 28, 2010 at 10:06 PM, Andrew Dickinson = wrote: > Short response: CONFIG_HPET was the dirty little bastard! > > Answering your questions below in case somebody else stumbles across > this thread... > > On Thu, Jan 28, 2010 at 4:18 PM, Brandeburg, Jesse > wrote: >> >> >> On Thu, 28 Jan 2010, Andrew Dickinson wrote: >>> I'm running into some unexpected performance issues. =A0I say >>> "unexpected" because I was running the same tests on this same box = 5 >>> months ago and getting very different (and much better) results. >> >> >> can you try turning off cpuspeed service, C-States in BIOS, and GV3 = (aka >> speedstep) support in BIOS? > > Yup, everything's on "maximum performance" in my BIOS's vernacular (H= P > GL360g6) no C-states, etc. > >> Have you upgraded your BIOS since before? > > Not that I'm aware of, but our provisioning folks might have done > something crazy. > >> I agree you should be able to see better numbers, I suspect that you= are >> getting cross-cpu traffic that is limiting your throughput. > > That's what I would have suspected as well. > >> How many flows are you pushing? > > I'm pushing two streams of traffic, one in each direction. =A0Each > stream is defined as follows: > =A0 =A0North-bound: > =A0 =A0 =A0 =A0L2: a0a0a0a0a0a0 -> b0b0b0b0b0b0 > =A0 =A0 =A0 =A0L3: RAND(10.0.0.0/16) -> RAND(100.0.0.0/16) > =A0 =A0 =A0 =A0L4: UDP with random data > =A0 =A0South-bound is the reverse. > > =A0 =A0where "RAND(CIDR)" is a random address within that CIDR (I'm u= sing > an hardware traffic generator). > >> Another idea is to compile the "perf" tool in the tools/perf directo= ry of >> the kernel and run "perf record -a -- sleep 10" while running at ste= ady >> state. =A0then show output of perf report to get an idea of which fu= nctions >> are eating all the cpu time. >> >> did you change to the "tickless" kernel? =A0We've also found that ro= uting >> performance improves dramatically by disabling tickless, preemptive = kernel >> and setting HZ=3D100. =A0What about CONFIG_HPET? > > yes, yes, yes, and no... > > changed CONFIG_HPET to n, rebooted and retested.... > > ta-da! > >> You should try the kernel that the scheduler fixes went into (maybe = 31?) >> or at least try 2.6.32.6 so you've tried something fully up to date. > > I'll give it a whirl :D > >>> =3D=3D=3D Background =3D=3D=3D >>> >>> The box is a dual Core i7 box with a pair of Intel 82598EB's. =A0I'= m >>> running 2.6.30 with the in-kernel ixgbe driver. =A0My tests 5 month= s ago >>> were using 2.6.30-rc3 (with a tiny patch from David Miller as seen >>> here: http://kerneltrap.org/mailarchive/linux-netdev/2009/4/30/5605= 924). >>> =A0The box is configured with both NICs in a bridge; normally I'm d= oing >>> some packet processing using ebtables, but for the sake of keeping >>> things simple, I'm not doing anything special.. just straight bridg= ing >>> (no ebtables rules, etc). =A0I'm not running irqbalance and instead >>> pinning my interrupts, one per core. =A0I've re-read and double che= cked >>> various settings based on Intel's README (i.e. gso off, tso off, et= c). >>> >>> In my previous tests, i was able to pass 3+Mpps regardless of how t= hat >>> was divided across the two NICS (i.e. 3Mpps all in one direction, >>> 1.5Mpps in each direction simultaneously, etc). =A0Now, I'm hardly = able >>> to exceed about 750kpps x 2 (i.e. 750k in both directions), and I >>> can't do more than 750kpps in one direction even with the other >>> direction having no traffic). >>> >>> Unfortunately, I didn't take very good notes when I did this last t= ime >>> so I don't have my previous .config and I'm not 100% positive I've = got >>> identical ethtool settings, etc. =A0That being said, I've worked th= rough >>> seemingly every combination of factors that I can think of and I'm >>> still unable to see the old performance (NUMA on/off, Hyperthreadin= g >>> on/off, various irq coelescing settings, etc). >>> >>> I have two identical boxes, they both see the same thing; so a >>> hardware issue seems unlikely. =A0My next step is to grab 2.6.30-rc= 3 and >>> see if I can repro the good performance with that kernel again and >>> determine if there was a regression between 2.6.30-rc3 and 2.6.30..= =2E >>> but I'm skeptical that that's the issue since I'm sure other people >>> would have noticed this as well. >>> >>> >>> =3D=3D=3D What I'm seeing =3D=3D=3D >>> >>> CPU% (almost entirely softirq time, which is expected) ramps extrem= ely >>> quickly as packet rate increases. =A0The following table show the p= acket >>> rate ("150 x 2" means 150kpps in each direction simultaneously), th= e >>> right side is the cpu utilization (as measured by %si in top). >>> >>> 150 x 2: =A0 4% >>> 300 x 2: =A0 8% >>> 450 x 2: =A018% >>> 483 x 2: =A050% >>> 525 x 2: =A066% >>> 600 x 2: =A085% >>> 750 x 2: 100% (and dropping frames) >>> >>> I _am_ seeing interrupts getting spread nicely across cores, so in = the >>> "150 x 2" case, that's about 4% soft-interrupt time per each of the= 16 >>> cores. =A0 The CPUs are otherwise idle bar a small amount of hardwa= re >>> interrupt time (less than 1%). >>> >>> >>> =3D=3D=3D Where it gets weird... =3D=3D=3D >>> >>> Trying to isolate the problem, I added an ebtables rule to drop >>> everything on the forward chain. =A0I was expecting to see the CPU >>> utilization drop since I'd no longer be dealing with the TX-side...= no >>> change. >>> >>> I then decided to switch from a bridge to a route-based solution. =A0= I >>> tore down the bridge, enabled ip_forward, setup some IPs and route >>> entries, etc. =A0Nothing changes. =A0CPU performance is identical t= o >>> what's shown above. =A0Additionally, if I add an iptables drop on >>> FORWARD, the CPU utilization remains unchanged (just like in the >>> bridging case above). >>> >>> The point that [I think] I'm driving to is that there's something >>> fishy going on with the receive-side of the packets. =A0I wish I co= uld >>> point to something more specific or a section of code, but I haven'= t >>> been able to par this down to anything more granular in my testing. >>> >>> >>> =3D=3D=3D Questions =3D=3D=3D >>> >>> Has anybody seen this before? =A0If so, what was wrong? >>> Do you have any recommendations on things to try (either as guesses >>> or, even better, to help eliminate possibilities) >>> And along those lines... can anybody think of any possible reasons = for this? >> >> hope the above helped. >> >>> This is so frustrating since I _know_ this hardware is capable of s= o >>> much more. =A0It's relatively painless for me to re-run tests in my= lab, >>> so feel free to throw something at me that you think will stick :D >> >> last I checked, I recall with 82599 I was pushing ~4.5 million 64 by= te >> packets a second (bidirectional, no drop), after disabling irqbalanc= e and >> 16 tx/rx queues set with set_irq_affinity.sh script (available in ou= r >> ixgbe-foo.tar.gz from sourceforge). =A082598 should be a bit lower, = but >> probably can get close to that number. >> >> I haven't run the test lately though, but at that point I was likely= on >> 2.6.30 ish >> >> Jesse >> > > Thank you so much... I wish I'd sent this email out a week ago ;-P > > -A >