public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Miller <davem@davemloft.net>,
	"peterz@infradead.org" <peterz@infradead.org>,
	"arjan@linux.intel.com" <arjan@linux.intel.com>,
	"yong.zhang0@gmail.com" <yong.zhang0@gmail.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"arjan@linux.jf.intel.com" <arjan@linux.jf.intel.com>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>
Subject: Re: [PATCH] irq: Add node_affinity CPU masks for smarter irqbalance hints
Date: Tue, 24 Nov 2009 11:53:14 -0800	[thread overview]
Message-ID: <1259092394.2631.64.camel@ppwaskie-mobl2> (raw)
In-Reply-To: <4B0C2D85.7020200@gmail.com>

On Tue, 2009-11-24 at 11:01 -0800, Eric Dumazet wrote:
> Peter P Waskiewicz Jr a écrit :
> 
> > That's exactly what we're doing in our 10GbE driver right now (isn't
> > pushed upstream yet, still finalizing our testing).  We spread to all
> > NUMA nodes in a semi-intelligent fashion when allocating our rings and
> > buffers.  The last piece is ensuring the interrupts tied to the various
> > queues all route to the NUMA nodes those CPUs belong to.  irqbalance
> > needs some kind of hint to make sure it does the right thing, which
> > today it does not.
> 
> sk_buff allocations should be done on the node of the cpu handling rx interrupts.

Yes, but we preallocate the buffers to minimize overhead when running
our interrupt routines.  Regardless, whatever queue we're filling with
those sk_buff's has an interrupt vector attached.  So wherever the
descriptor ring/queue and its associated buffers were allocated, that is
where the interrupt's affinity needs to be set to.

> For rings, I am ok for irqbalance and driver cooperation, in case admin
>  doesnt want to change the defaults.
> 
> > 
> > I don't see how this is complex though.  Driver loads, allocates across
> > the NUMA nodes for optimal throughput, then writes CPU masks for the
> > NUMA nodes each interrupt belongs to.  irqbalance comes along and looks
> > at the new mask "hint," and then balances that interrupt within that
> > hinted mask.
> 
> So NUMA policy is given by the driver at load time ?

I think it would have to.  Nobody else has insight how the driver
allocated its resources.  So the driver can be told where to allocate
(see below), or the driver needs to indicate upwards how it allocated
resources.

> An admin might chose to direct all NIC trafic to a given node, because
> its machine has mixed workload. 3 nodes out of 4 for database workload,
> one node for network IO...
> 
> So if an admin changes smp_affinity, is your driver able to reconfigure itself
> and re-allocate all its rings to be on NUMA node chosen by admin ? This is
> what I qualify as complex.

No, we don't want to go this route of reallocation.  This, I agree, is
very complex, and can be very devastating.  We'd basically be resetting
the driver whenever an interrupt moved, so this could be a terrible DoS
vulnerability.

Jesse Brandeburg has a set of patches he's working on that will allow us
to bind an interface to a single node.  So in your example of 3 nodes
for DB workload and 1 for network I/O, the driver can be loaded and
directly bound to that 4th node.  Then the node_affinity mask would be
set by the driver for the CPU mask of that single node.  But in these
deployments, a sysadmin changing affinity that will fly directly in the
face of how resources are laid out is poor system administration.  I
know it will happen, but I don't know how far we need to protect the
sysadmin from shooting themselves in the foot in terms of performance
tuning.

Cheers,
-PJ

  reply	other threads:[~2009-11-24 19:53 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-11-23  6:46 [PATCH] irq: Add node_affinity CPU masks for smarter irqbalance hints Peter P Waskiewicz Jr
2009-11-23  7:32 ` Yong Zhang
2009-11-23  9:36   ` Peter P Waskiewicz Jr
2009-11-23 10:21     ` ixgbe question Eric Dumazet
2009-11-23 10:30       ` Badalian Vyacheslav
2009-11-23 10:34       ` Waskiewicz Jr, Peter P
2009-11-23 10:37         ` Eric Dumazet
2009-11-23 14:05           ` Eric Dumazet
2009-11-23 21:26           ` David Miller
2009-11-23 14:10       ` Jesper Dangaard Brouer
2009-11-23 14:38         ` Eric Dumazet
2009-11-23 18:30           ` robert
2009-11-23 16:59             ` Eric Dumazet
2009-11-23 20:54               ` robert
2009-11-23 21:28                 ` David Miller
2009-11-23 22:14                   ` Robert Olsson
2009-11-23 23:28               ` Waskiewicz Jr, Peter P
2009-11-23 23:44                 ` David Miller
2009-11-24  7:46                 ` Eric Dumazet
2009-11-24  8:46                   ` Badalian Vyacheslav
2009-11-24  9:07                   ` Peter P Waskiewicz Jr
2009-11-24  9:55                     ` Eric Dumazet
2009-11-24 10:06                       ` Peter P Waskiewicz Jr
2009-11-24 11:37                         ` [PATCH net-next-2.6] ixgbe: Fix TX stats accounting Eric Dumazet
2009-11-24 13:23                           ` Eric Dumazet
2009-11-25  7:38                             ` Jeff Kirsher
2009-11-25  9:31                               ` Eric Dumazet
2009-11-25  9:38                                 ` Jeff Kirsher
2009-11-24 13:14                         ` ixgbe question John Fastabend
2009-11-29  8:18                           ` David Miller
2009-11-30 13:02                             ` Eric Dumazet
2009-11-30 20:20                               ` John Fastabend
2009-11-26 14:10                       ` Badalian Vyacheslav
2009-11-23 17:05     ` [PATCH] irq: Add node_affinity CPU masks for smarter irqbalance hints Peter Zijlstra
2009-11-23 23:32       ` Waskiewicz Jr, Peter P
2009-11-24  8:38         ` Peter Zijlstra
2009-11-24  8:59           ` Peter P Waskiewicz Jr
2009-11-24  9:08             ` Peter Zijlstra
2009-11-24  9:15               ` Peter P Waskiewicz Jr
2009-11-24 14:43               ` Arjan van de Ven
2009-11-24  9:15             ` Peter Zijlstra
2009-11-24 10:07             ` Thomas Gleixner
2009-11-24 17:55               ` Peter P Waskiewicz Jr
2009-11-25 11:18               ` Peter Zijlstra
2009-11-24  6:07       ` Arjan van de Ven
2009-11-24  8:39         ` Peter Zijlstra
2009-11-24 14:42           ` Arjan van de Ven
2009-11-24 17:39           ` David Miller
2009-11-24 17:56             ` Peter P Waskiewicz Jr
2009-11-24 18:26               ` Eric Dumazet
2009-11-24 18:33                 ` Peter P Waskiewicz Jr
2009-11-24 19:01                   ` Eric Dumazet
2009-11-24 19:53                     ` Peter P Waskiewicz Jr [this message]
2009-11-24 18:54                 ` David Miller
2009-11-24 18:58                   ` Eric Dumazet
2009-11-24 20:35                     ` Andi Kleen
2009-11-24 20:46                       ` Eric Dumazet
2009-11-25 10:30                         ` Eric Dumazet
2009-11-25 10:37                           ` Andi Kleen
2009-11-25 11:35                             ` Eric Dumazet
2009-11-25 11:50                               ` Andi Kleen
2009-11-26 11:43                                 ` Eric Dumazet
2009-11-24  5:17     ` Yong Zhang
2009-11-24  8:39       ` Peter P Waskiewicz Jr
  -- strict thread matches above, loose matches on Subject: below --
2009-11-23  7:12 Peter P Waskiewicz Jr

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1259092394.2631.64.camel@ppwaskie-mobl2 \
    --to=peter.p.waskiewicz.jr@intel.com \
    --cc=arjan@linux.intel.com \
    --cc=arjan@linux.jf.intel.com \
    --cc=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=yong.zhang0@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox