Re: net: Automatic IRQ siloing for network devices

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Stephen Hemminger <shemminger@vyatta.com>
To: Neil Horman <nhorman@tuxdriver.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>,
	netdev@vger.kernel.org, davem@davemloft.net
Subject: Re: net: Automatic IRQ siloing for network devices
Date: Sat, 16 Apr 2011 09:17:04 -0700	[thread overview]
Message-ID: <20110416091704.4fa62a50@nehalam> (raw)
In-Reply-To: <20110416015938.GB2200@neilslaptop.think-freely.org>

On Fri, 15 Apr 2011 21:59:38 -0400
Neil Horman <nhorman@tuxdriver.com> wrote:

> On Fri, Apr 15, 2011 at 11:54:29PM +0100, Ben Hutchings wrote:
> > On Fri, 2011-04-15 at 16:17 -0400, Neil Horman wrote:
> > > Automatic IRQ siloing for network devices
> > > 
> > > At last years netconf:
> > > http://vger.kernel.org/netconf2010.html
> > > 
> > > Tom Herbert gave a talk in which he outlined some of the things we can do to
> > > improve scalability and througput in our network stack
> > > 
> > > One of the big items on the slides was the notion of siloing irqs, which is the
> > > practice of setting irq affinity to a cpu or cpu set that was 'close' to the
> > > process that would be consuming data.  The idea was to ensure that a hard irq
> > > for a nic (and its subsequent softirq) would execute on the same cpu as the
> > > process consuming the data, increasing cache hit rates and speeding up overall
> > > throughput.
> > > 
> > > I had taken an idea away from that talk, and have finally gotten around to
> > > implementing it.  One of the problems with the above approach is that its all
> > > quite manual.  I.e. to properly enact this siloiong, you have to do a few things
> > > by hand:
> > > 
> > > 1) decide which process is the heaviest user of a given rx queue 
> > > 2) restrict the cpus which that task will run on
> > > 3) identify the irq which the rx queue in (1) maps to
> > > 4) manually set the affinity for the irq in (3) to cpus which match the cpus in
> > > (2)
> > [...]
> > 
> > This presumably works well with small numbers of flows and/or large
> > numbers of queues.  You could scale it up somewhat by manipulating the
> > device's flow hash indirection table, but that usually only has 128
> > entries.  (Changing the indirection table is currently quite expensive,
> > though that could be changed.)
> > 
> > I see RFS and accelerated RFS as the only reasonable way to scale to
> > large numbers of flows.  And as part of accelerated RFS, I already did
> > the work for mapping CPUs to IRQs (note, not the other way round).  If
> > IRQ affinity keeps changing then it will significantly undermine the
> > usefulness of hardware flow steering.
> > 
> > Now I'm not saying that your approach is useless.  There is more
> > hardware out there with flow hashing than with flow steering, and there
> > are presumably many systems with small numbers of active flows.  But I
> > think we need to avoid having two features that conflict and a
> > requirement for administrators to make a careful selection between them.
> > 
> > Ben.
> > 
> I hear what your saying and I agree, theres no point in having features work
> against each other.  That said, I'm not sure I agree that these features have to
> work against one another, nor does a sysadmin need to make a choice between the
> two.  Note the third patch in this series.  Making this work requires that
> network drivers wanting to participate in this affinity algorithm opt in by
> using the request_net_irq macro to attach the interrupt to the rfs affinity code
> that I added.  Theres no reason that a driver which supports hardware that still
> uses flow steering can't opt out of this algorithm, and as a result irqbalance
> will still treat those interrupts as it normally does.  And for those drivers
> which do opt in, irqbalance can take care of affinity assignment, using the
> provided hint.  No need for sysadmin intervention.
> 
> I'm sure there can be improvements made to this code, but I think theres less
> conflict between the work you've done and this code than there appears to be at
> first blush.
> 

My gut feeling is that:
  * kernel should default to a simple static sane irq policy without user
    space.  This is especially true for multi-queue devices where the default
    puts all IRQ's on one cpu.

  * irqbalance should do a one-shot rearrangement at boot up. It should rearrange
    when new IRQ's are requested. The kernel should have capablity to notify
    userspace (uevent?) when IRQ's are added or removed.

  * Let scheduler make decisions about migrating processes (rather than let irqbalance
    migrate IRQ's).

  * irqbalance should not do the hacks it does to try and guess at network traffic.


--

next prev parent reply	other threads:[~2011-04-16 16:17 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-04-15 20:17 net: Automatic IRQ siloing for network devices Neil Horman
2011-04-15 20:17 ` [PATCH 1/3] irq: Add registered affinity guidance infrastructure Neil Horman
2011-04-16  0:22   ` Thomas Gleixner
2011-04-16  2:11     ` Neil Horman
2011-04-15 20:17 ` [PATCH 2/3] net: Add net device irq siloing feature Neil Horman
2011-04-15 22:49   ` Ben Hutchings
2011-04-16  1:49     ` Neil Horman
2011-04-16  4:52       ` Stephen Hemminger
2011-04-16  6:21         ` Eric Dumazet
2011-04-16 11:55           ` Neil Horman
2011-04-15 20:17 ` [PATCH 3/3] net: Adding siloing irqs to cxgb4 driver Neil Horman
2011-04-15 22:54 ` net: Automatic IRQ siloing for network devices Ben Hutchings
2011-04-16  0:50   ` Ben Hutchings
2011-04-16  1:59   ` Neil Horman
2011-04-16 16:17     ` Stephen Hemminger [this message]
2011-04-17 17:20       ` Neil Horman
2011-04-17 18:38         ` Ben Hutchings
2011-04-18  1:08           ` Neil Horman
2011-04-18 21:51             ` Ben Hutchings
2011-04-19  0:52               ` Neil Horman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110416091704.4fa62a50@nehalam \
    --to=shemminger@vyatta.com \
    --cc=bhutchings@solarflare.com \
    --cc=davem@davemloft.net \
    --cc=netdev@vger.kernel.org \
    --cc=nhorman@tuxdriver.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).