netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Tom Herbert <therbert@google.com>
To: Ben Hutchings <bhutchings@solarflare.com>
Cc: netdev <netdev@vger.kernel.org>,
	Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Stephen Hemminger <shemminger@vyatta.com>,
	sf-linux-drivers <linux-net-drivers@solarflare.com>
Subject: Re: [RFC] Setting processor affinity for network queues
Date: Mon, 1 Mar 2010 12:46:26 -0800	[thread overview]
Message-ID: <65634d661003011246x24aad2c5m84fc5ee3d809b0d8@mail.gmail.com> (raw)
In-Reply-To: <1267464102.2092.136.camel@achroite.uk.solarflarecom.com>

On Mon, Mar 1, 2010 at 9:21 AM, Ben Hutchings <bhutchings@solarflare.com> wrote:
> With multiqueue network hardware or Receive/Transmit Packet Steering
> (RPS/XPS) we can spread out network processing across multiple
> processors.  The administrator should be able to control the number of
> channels and the processor affinity of each.
>
> By 'channel' I mean a bundle of:
> - a wakeup (IRQ or IPI)
> - a receive queue whose completions trigger the wakeup
> - a transmit queue whose completions trigger the wakeup
> - a NAPI instance scheduled by the wakeup, which handles the completions
>

Yes.  Also in the receive side it is really cumbersome to do per NAPI
RPS settings without the receive napi instance not be exposed in
netif_rx.  Maybe a reference to NAPI structure can be added in skb?
This could clean up RPS a lot.

Tom

> Numbers of RX and TX queues used on a device do not have to match, but
> ideally they should.  For generality, you can subsitute 'a receive
> and/or a transmit queue' above.  At the hardware level the numbers of
> queues could be different e.g. in the sfc driver a channel would be
> associated with 1 hardware RX queue, 2 hardware TX queues (with and
> without checksum offload) and 1 hardware event queue.
>
> Currently we have a userspace interface for setting affinity of IRQs and
> a convention for naming each channel's IRQ handler, but no such
> interface for memory allocation.  For RX buffers this should not be a
> problem since they are normally allocated as older buffers are
> completed, in the NAPI context.  However, the DMA descriptor rings and
> driver structures for a channel should also be allocated on the NUMA
> node where NAPI processing is done.  Currently this allocation takes
> place when a net device is created or when it is opened, before an
> administrator has any opportunity to configure affinity.  Reallocation
> will normally require a complete stop to network traffic (at least on
> the affected queues) so it should not be done automatically when the
> driver detects a change in IRQ affinity.  There needs to be an explicit
> mechanism for changing it.
>
> Devices using RPS will not generally be able to implement NUMA affinity
> for RX buffer allocation, but there will be a similar issue of processor
> selection for IPIs and NUMA node affinity for driver structures.  The
> proposed interface for setting processor affinity should cover this, but
> it is completely different from the IRQ affinity mechanism for hardware
> multiqueue devices.  That seems undesirable.
>
> Therefore I propose that:
>
> 1. Channels (or NAPI instances) should be exposed in sysfs.
> 2. Channels will have processor affinity, exposed read/write in sysfs.
> Changing this triggers the networking core and driver to reallocate
> associated structures if the processor affinity moved between NUMA
> nodes, and triggers the driver to set IRQ affinity.
> 3. The networking core will set the initial affinity for each channel.
> There may be global settings to control this.
> 4. Drivers should not set IRQ affinity.
> 5. irqbalanced should not set IRQ affinity for multiqueue network
> devices.
>
> (Most of this has been proposed already, but I'm trying to bring it all
> together.)
>
> Ben.
>
> --
> Ben Hutchings, Senior Software Engineer, Solarflare Communications
> Not speaking for my employer; that's the marketing department's job.
> They asked us to note that Solarflare product names are trademarked.
>
>

      parent reply	other threads:[~2010-03-01 20:46 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-03-01 17:21 [RFC] Setting processor affinity for network queues Ben Hutchings
2010-03-01 17:43 ` Tadepalli, Hari K
2010-03-01 20:46 ` Tom Herbert [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=65634d661003011246x24aad2c5m84fc5ee3d809b0d8@mail.gmail.com \
    --to=therbert@google.com \
    --cc=bhutchings@solarflare.com \
    --cc=linux-net-drivers@solarflare.com \
    --cc=netdev@vger.kernel.org \
    --cc=peter.p.waskiewicz.jr@intel.com \
    --cc=peterz@infradead.org \
    --cc=shemminger@vyatta.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).