Re: [LSF/MM TOPIC] multiqueue and interrupt assignment

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Bart Van Assche <bart.vanassche@sandisk.com>
To: Hannes Reinecke <hare@suse.de>,
	"lsf-pc@lists.linux-foundation.org"
	<lsf-pc@lists.linux-foundation.org>,
	"linux-scsi@vger.kernel.org" <Linux-scsi@vger.kernel.org>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>
Subject: Re: [LSF/MM TOPIC] multiqueue and interrupt assignment
Date: Tue, 2 Feb 2016 10:23:23 -0800	[thread overview]
Message-ID: <56B0F41B.8010706@sandisk.com> (raw)
In-Reply-To: <56B0D9CF.1080001@suse.de>

On 02/02/2016 08:31 AM, Hannes Reinecke wrote:
> here's another topic which I've hit during my performance tests:
> How should interrupt affinity be handled with blk-multiqueue?
>
> The problem is that the blk-multiqueue assumes a certain
> CPU-to-queue mapping, _and_ the 'queue' in blk-mq syntax is actually
> a submission/completion queue pair.
>
> To achieve optimal performance one should set the interrupt affinity
> for a given (hardware) queue to the matchine (blk-mq) queue.
> But typically the interrupt affinity has to be set during HBA setup
> ie way before any queues are allocated.
> Which means we have three choices:
> - outguess the blk-mq algorithm in the driver and set the
>    interrupt affinity during HBA setup
> - Add some callbacks to coordinate interrupt affinity between
>    driver and blk-mq
> - Defer it to manual assignment, but inferring the risk of
>    a suboptimal performance.
>
> At LSF/MM  I would like to have a discussion on how the interrupt
> affinity should be handled for blk-mq, and whether a generic method
> is possible or desirable.
> Also there is the issue of certain drivers (eg lpfc) which normally
> do interrupt affinity themselves, but disable it for multiqueue.
> Which results in abysmal performance when comparing single queue
> against multiqueue :-(
>
> As a side note, what does blk-mq do if the interrupt affinity is
> _deliberately_ set wrong? IE if the completions for one command
> arrive on completely the wrong queue? Discard the completion? Move
> it to the correct queue?

Hello Hannes,

This topic indeed needs further attention. I also encountered this
challenge while adding scsi-mq support to the SRP initiator driver. What
I learned while working on the SRP driver is the following:
- Although I agree that requests and interrupts should be processed on
   the same processor (same physical chip) if the request has been
   submitted from the CPU closest to the HBA, I'm not convinced that
   processing request completions and interrupts on the same CPU core
   yields the best performance. I would appreciate it if there would
   remain some freedom in how to assign interrupts to CPU cores.
- In several older NUMA systems (Nehalem) the distance from processor
   to PCI adapter is the same for all processors. However, in current
   NUMA systems (Sandy Bridge and later) typically only from one
   processor access latency to a given PCI adapter is optimal. The
   question then becomes which code should hit the QPI latency penalty:
   the interrupt handler or the blk-mq request completion processing
   code ?
- All HBAs I know of support reassignment of an interrupt to another
   CPU core through /proc/irq/<n>/smp_affinity so I was surprised to
   read that you encountered a HBA for which CPU affinity has to be
   set at driver load time ?
- For HBAs that support multiple MSI-X vectors we need an approach for
   associating blk-mq hw-queues with MSI-X vectors. The approach
   implemented in the ib_srp driver is that that driver assumes that
   MSI-X vectors have been spread evenly over physical processors. The
   ib_srp driver then selects an MSI-X vector per hwqueue based on that
   assumption. Since neither the kernel nor irqbalance currently support
   this approach I wrote a script to implement this (see also
http://thread.gmane.org/gmane.linux.kernel.device-mapper.devel/21312/focus=98409).
- We need support in irqbalance for HBAs that support multiple MSI-X
   vectors. Last time I checked irqbalance did not support this concept
   which means that it even could happen that irqbalance assigned
   multiple of these interrupt vectors to the same CPU core, something
   that doesn't make sense to me.

A previous discussion about this topic is available in the following
e-mail thread: Christoph Hellwig, [TECH TOPIC] IRQ affinity, linux-rdma
and linux-kernel mailing lists, July 2015
(http://thread.gmane.org/gmane.linux.drivers.rdma/27418). I would
appreciate it if Matthew Wilcox' proposal could be discussed further
during the LSF/MM (http://thread.gmane.org/gmane.linux.drivers.rdma/27418).

Thanks,

Bart.
PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).

next prev parent reply	other threads:[~2016-02-02 18:23 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-02 16:31 [LSF/MM TOPIC] multiqueue and interrupt assignment Hannes Reinecke
2016-02-02 18:23 ` Bart Van Assche [this message]
2016-02-03 12:57   ` Sagi Grimberg
2016-02-03 13:13     ` Hannes Reinecke
2016-02-03 13:32       ` Sagi Grimberg
2016-02-03 15:03         ` Hannes Reinecke
2016-03-03  7:59           ` Ming Lei
2016-02-02 18:45 ` Elliott, Robert (Persistent Memory)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56B0F41B.8010706@sandisk.com \
    --to=bart.vanassche@sandisk.com \
    --cc=Linux-scsi@vger.kernel.org \
    --cc=hare@suse.de \
    --cc=linux-block@vger.kernel.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.