From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ben Hutchings Subject: Re: net: Automatic IRQ siloing for network devices Date: Sun, 17 Apr 2011 19:38:59 +0100 Message-ID: <1303065539.5282.938.camel@localhost> References: <1302898677-3833-1-git-send-email-nhorman@tuxdriver.com> <1302908069.2845.29.camel@bwh-desktop> <20110416015938.GB2200@neilslaptop.think-freely.org> <20110416091704.4fa62a50@nehalam> <20110417172010.GA3362@neilslaptop.think-freely.org> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit Cc: Stephen Hemminger , netdev@vger.kernel.org, davem@davemloft.net, Thomas Gleixner To: Neil Horman Return-path: Received: from exchange.solarflare.com ([216.237.3.220]:36993 "EHLO exchange.solarflare.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753231Ab1DQSjE (ORCPT ); Sun, 17 Apr 2011 14:39:04 -0400 In-Reply-To: <20110417172010.GA3362@neilslaptop.think-freely.org> Sender: netdev-owner@vger.kernel.org List-ID: On Sun, 2011-04-17 at 13:20 -0400, Neil Horman wrote: > On Sat, Apr 16, 2011 at 09:17:04AM -0700, Stephen Hemminger wrote: [...] > > My gut feeling is that: > > * kernel should default to a simple static sane irq policy without user > > space. This is especially true for multi-queue devices where the default > > puts all IRQ's on one cpu. > > > Thats not how it currently works, AFAICS. The default kernel policy is > currently that cpu affinity for any newly requested irq is all cpus. Any > restriction beyond that is the purview and doing of userspace (irqbalance or > manual affinity setting). Right. Though it may be reasonable for the kernel to use the hint as the initial affinity for a newly allocated IRQ (not sure quite how we determine that). [...] > > * irqbalance should not do the hacks it does to try and guess at network traffic. > > > Well, I can certainly agree with that, but I'm not sure what that looks like. > > I could envision something like: > > 1) Use irqbalance to do a one time placement of interrupts, keeping a simple > (possibly sub-optimal) policy, perhaps something like new irqs get assigned to > the least loaded cpu within the numa node of the device the irq is originating > from. > > 2) Add a udev event on the addition of new interrupts, to rerun irqbalance Yes, making irqbalance more (or entirely) event-driven seems like a good thing. > 3) Add some exported information to identify processes that are high users of > network traffic, and correlate that usage to a rxq/irq that produces that > information (possibly some per-task proc file) > > 4) Create/expand an additional user space daemon to monitor the highest users of > network traffic on various rxq/irqs (as identified in (3)) and restrict those > processes execution to those cpus which are on the same L2 cache as the irq > itself. The cpuset cgroup could be usefull in doing this perhaps. I just don't see that you're going to get processes associated with specific RX queues unless you make use of flow steering. The 128-entry flow hash indirection table is part of Microsoft's requirements for RSS so most multiqueue hardware is going to let you do limited flow steering that way. > Actually, as I read back to myself, that acutally sounds kind of good to me. It > keeps all the policy for this in user space, and minimizes what we have to add > to the kernel to make it happen (some process information in /proc and another > udev event). I'd like to get some feedback before I start implementing this, > but I think this could be done. What do you think? I don't think it's a good idea to override the scheduler dynamically like this. Ben. -- Ben Hutchings, Senior Software Engineer, Solarflare Not speaking for my employer; that's the marketing department's job. They asked us to note that Solarflare product names are trademarked.