Re: net: Automatic IRQ siloing for network devices

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Stephen Hemminger <shemminger@vyatta.com>
To: Neil Horman <nhorman@tuxdriver.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>,
	netdev@vger.kernel.org, davem@davemloft.net
Subject: Re: net: Automatic IRQ siloing for network devices
Date: Sat, 16 Apr 2011 09:17:04 -0700	[thread overview]
Message-ID: <20110416091704.4fa62a50@nehalam> (raw)
In-Reply-To: <20110416015938.GB2200@neilslaptop.think-freely.org>

On Fri, 15 Apr 2011 21:59:38 -0400
Neil Horman <nhorman@tuxdriver.com> wrote:

> On Fri, Apr 15, 2011 at 11:54:29PM +0100, Ben Hutchings wrote:
> > On Fri, 2011-04-15 at 16:17 -0400, Neil Horman wrote:
> > > Automatic IRQ siloing for network devices
> > > 
> > > At last years netconf:
> > > http://vger.kernel.org/netconf2010.html
> > > 
> > > Tom Herbert gave a talk in which he outlined some of the things we can do to
> > > improve scalability and througput in our network stack
> > > 
> > > One of the big items on the slides was the notion of siloing irqs, which is the
> > > practice of setting irq affinity to a cpu or cpu set that was 'close' to the
> > > process that would be consuming data.  The idea was to ensure that a hard irq
> > > for a nic (and its subsequent softirq) would execute on the same cpu as the
> > > process consuming the data, increasing cache hit rates and speeding up overall
> > > throughput.
> > > 
> > > I had taken an idea away from that talk, and have finally gotten around to
> > > implementing it.  One of the problems with the above approach is that its all
> > > quite manual.  I.e. to properly enact this siloiong, you have to do a few things
> > > by hand:
> > > 
> > > 1) decide which process is the heaviest user of a given rx queue 
> > > 2) restrict the cpus which that task will run on
> > > 3) identify the irq which the rx queue in (1) maps to
> > > 4) manually set the affinity for the irq in (3) to cpus which match the cpus in
> > > (2)
> > [...]
> > 
> > This presumably works well with small numbers of flows and/or large
> > numbers of queues.  You could scale it up somewhat by manipulating the
> > device's flow hash indirection table, but that usually only has 128
> > entries.  (Changing the indirection table is currently quite expensive,
> > though that could be changed.)
> > 
> > I see RFS and accelerated RFS as the only reasonable way to scale to
> > large numbers of flows.  And as part of accelerated RFS, I already did
> > the work for mapping CPUs to IRQs (note, not the other way round).  If
> > IRQ affinity keeps changing then it will significantly undermine the
> > usefulness of hardware flow steering.
> > 
> > Now I'm not saying that your approach is useless.  There is more
> > hardware out there with flow hashing than with flow steering, and there
> > are presumably many systems with small numbers of active flows.  But I
> > think we need to avoid having two features that conflict and a
> > requirement for administrators to make a careful selection between them.
> > 
> > Ben.
> > 
> I hear what your saying and I agree, theres no point in having features work
> against each other.  That said, I'm not sure I agree that these features have to
> work against one another, nor does a sysadmin need to make a choice between the
> two.  Note the third patch in this series.  Making this work requires that
> network drivers wanting to participate in this affinity algorithm opt in by
> using the request_net_irq macro to attach the interrupt to the rfs affinity code
> that I added.  Theres no reason that a driver which supports hardware that still
> uses flow steering can't opt out of this algorithm, and as a result irqbalance
> will still treat those interrupts as it normally does.  And for those drivers
> which do opt in, irqbalance can take care of affinity assignment, using the
> provided hint.  No need for sysadmin intervention.
> 
> I'm sure there can be improvements made to this code, but I think theres less
> conflict between the work you've done and this code than there appears to be at
> first blush.
> 

My gut feeling is that:
  * kernel should default to a simple static sane irq policy without user
    space.  This is especially true for multi-queue devices where the default
    puts all IRQ's on one cpu.

  * irqbalance should do a one-shot rearrangement at boot up. It should rearrange
    when new IRQ's are requested. The kernel should have capablity to notify
    userspace (uevent?) when IRQ's are added or removed.

  * Let scheduler make decisions about migrating processes (rather than let irqbalance
    migrate IRQ's).

  * irqbalance should not do the hacks it does to try and guess at network traffic.


--

next prev parent reply	other threads:[~2011-04-16 16:17 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-04-15 20:17 net: Automatic IRQ siloing for network devices Neil Horman
2011-04-15 20:17 ` [PATCH 1/3] irq: Add registered affinity guidance infrastructure Neil Horman
2011-04-16  0:22   ` Thomas Gleixner
2011-04-16  2:11     ` Neil Horman
2011-04-15 20:17 ` [PATCH 2/3] net: Add net device irq siloing feature Neil Horman
2011-04-15 22:49   ` Ben Hutchings
2011-04-16  1:49     ` Neil Horman
2011-04-16  4:52       ` Stephen Hemminger
2011-04-16  6:21         ` Eric Dumazet
2011-04-16 11:55           ` Neil Horman
2011-04-15 20:17 ` [PATCH 3/3] net: Adding siloing irqs to cxgb4 driver Neil Horman
2011-04-15 22:54 ` net: Automatic IRQ siloing for network devices Ben Hutchings
2011-04-16  0:50   ` Ben Hutchings
2011-04-16  1:59   ` Neil Horman
2011-04-16 16:17     ` Stephen Hemminger [this message]
2011-04-17 17:20       ` Neil Horman
2011-04-17 18:38         ` Ben Hutchings
2011-04-18  1:08           ` Neil Horman
2011-04-18 21:51             ` Ben Hutchings
2011-04-19  0:52               ` Neil Horman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110416091704.4fa62a50@nehalam \
    --to=shemminger@vyatta.com \
    --cc=bhutchings@solarflare.com \
    --cc=davem@davemloft.net \
    --cc=netdev@vger.kernel.org \
    --cc=nhorman@tuxdriver.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.