All of lore.kernel.org
 help / color / mirror / Atom feed
From: Keith Busch <keith.busch@intel.com>
To: Christoph Hellwig <hch@lst.de>
Cc: axboe@fb.com, linux-block@vger.kernel.org,
	linux-nvme@lists.infradead.org,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [PATCH 4/7] blk-mq: allow the driver to pass in an affinity mask
Date: Tue, 6 Sep 2016 13:30:53 -0400	[thread overview]
Message-ID: <20160906173053.GC25201@localhost.localdomain> (raw)
In-Reply-To: <20160906165056.GB26214@lst.de>

On Tue, Sep 06, 2016 at 06:50:56PM +0200, Christoph Hellwig wrote:
> [adding Thomas as it's about the affinity_mask he (we) added to the
>  IRQ core]
> > Here's my topology info:
> > 
> >   # numactl --hardware
> >   available: 2 nodes (0-1)
> >   node 0 cpus: 0 1 2 3 4 5 6 7 16 17 18 19 20 21 22 23
> >   node 0 size: 15745 MB
> >   node 0 free: 15319 MB
> >   node 1 cpus: 8 9 10 11 12 13 14 15 24 25 26 27 28 29 30 31
> >   node 1 size: 16150 MB
> >   node 1 free: 15758 MB
> >   node distances:
> >   node   0   1
> >     0:  10  21
> >     1:  21  10
> 
> How do you get that mapping?  Does this CPU use Hyperthreading and
> thus expose siblings using topology_sibling_cpumask?  As that's the
> only thing the old code used for any sort of special casing.
> 
> I'll need to see if I can find a system with such a mapping to reproduce.

Yes, this is a two-socket server with hyperthreading enabled. Numbering
the physical CPUs before the hyperthreads is a common numbering on
x86, so we're going to see this split numbering on any multi-socket
hyperthreaded server.

The topology_sibling_cpumask shows the right information. The resulting
mask from cpu 0 on my server is 0x00010001; cpu 1 is 0x00020002, etc...

> > What we want for my CPU topology is the 16th CPU to pair with CPU 0,
> > 17 pairs with 1, 18 with 2, and so on. You can't convey that information
> > with this scheme. We need affinity_masks per vector.
> 
> We actually have per-vector masks, but they are hidden inside the IRQ
> core and awkward to use.  We could to the get_first_sibling magic
> in the block-mq queue mapping (and in fact with the current code I guess
> we need to).  Or take a step back from trying to emulate the old code
> and instead look at NUMA nodes instead of siblings which some folks
> suggested a while ago.

Adding the first sibling magic in blk-mq would fix my specific case,
but it doesn't help genericly when we need to pair more than just thread
siblings.

WARNING: multiple messages have this Message-ID (diff)
From: keith.busch@intel.com (Keith Busch)
Subject: [PATCH 4/7] blk-mq: allow the driver to pass in an affinity mask
Date: Tue, 6 Sep 2016 13:30:53 -0400	[thread overview]
Message-ID: <20160906173053.GC25201@localhost.localdomain> (raw)
In-Reply-To: <20160906165056.GB26214@lst.de>

On Tue, Sep 06, 2016@06:50:56PM +0200, Christoph Hellwig wrote:
> [adding Thomas as it's about the affinity_mask he (we) added to the
>  IRQ core]
> > Here's my topology info:
> > 
> >   # numactl --hardware
> >   available: 2 nodes (0-1)
> >   node 0 cpus: 0 1 2 3 4 5 6 7 16 17 18 19 20 21 22 23
> >   node 0 size: 15745 MB
> >   node 0 free: 15319 MB
> >   node 1 cpus: 8 9 10 11 12 13 14 15 24 25 26 27 28 29 30 31
> >   node 1 size: 16150 MB
> >   node 1 free: 15758 MB
> >   node distances:
> >   node   0   1
> >     0:  10  21
> >     1:  21  10
> 
> How do you get that mapping?  Does this CPU use Hyperthreading and
> thus expose siblings using topology_sibling_cpumask?  As that's the
> only thing the old code used for any sort of special casing.
> 
> I'll need to see if I can find a system with such a mapping to reproduce.

Yes, this is a two-socket server with hyperthreading enabled. Numbering
the physical CPUs before the hyperthreads is a common numbering on
x86, so we're going to see this split numbering on any multi-socket
hyperthreaded server.

The topology_sibling_cpumask shows the right information. The resulting
mask from cpu 0 on my server is 0x00010001; cpu 1 is 0x00020002, etc...

> > What we want for my CPU topology is the 16th CPU to pair with CPU 0,
> > 17 pairs with 1, 18 with 2, and so on. You can't convey that information
> > with this scheme. We need affinity_masks per vector.
> 
> We actually have per-vector masks, but they are hidden inside the IRQ
> core and awkward to use.  We could to the get_first_sibling magic
> in the block-mq queue mapping (and in fact with the current code I guess
> we need to).  Or take a step back from trying to emulate the old code
> and instead look at NUMA nodes instead of siblings which some folks
> suggested a while ago.

Adding the first sibling magic in blk-mq would fix my specific case,
but it doesn't help genericly when we need to pair more than just thread
siblings.

  reply	other threads:[~2016-09-06 17:20 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-29 10:53 blk-mq: allow passing in an external queue mapping V2 Christoph Hellwig
2016-08-29 10:53 ` Christoph Hellwig
2016-08-29 10:53 ` [PATCH 1/7] blk-mq: don't redistribute hardware queues on a CPU hotplug event Christoph Hellwig
2016-08-29 10:53   ` Christoph Hellwig
2016-08-29 10:53 ` [PATCH 2/7] blk-mq: only allocate a single mq_map per tag_set Christoph Hellwig
2016-08-29 10:53   ` Christoph Hellwig
2016-08-29 10:53 ` [PATCH 3/7] blk-mq: remove ->map_queue Christoph Hellwig
2016-08-29 10:53   ` Christoph Hellwig
2016-08-29 10:53 ` [PATCH 4/7] blk-mq: allow the driver to pass in an affinity mask Christoph Hellwig
2016-08-29 10:53   ` Christoph Hellwig
2016-08-31 16:38   ` Keith Busch
2016-08-31 16:38     ` Keith Busch
2016-09-01  8:46     ` Christoph Hellwig
2016-09-01  8:46       ` Christoph Hellwig
2016-09-01 14:24       ` Keith Busch
2016-09-01 14:24         ` Keith Busch
2016-09-01 23:30         ` Keith Busch
2016-09-01 23:30           ` Keith Busch
2016-09-05 19:48         ` Christoph Hellwig
2016-09-05 19:48           ` Christoph Hellwig
2016-09-06 14:39           ` Keith Busch
2016-09-06 14:39             ` Keith Busch
2016-09-06 16:50             ` Christoph Hellwig
2016-09-06 16:50               ` Christoph Hellwig
2016-09-06 17:30               ` Keith Busch [this message]
2016-09-06 17:30                 ` Keith Busch
2016-09-07 15:38               ` Thomas Gleixner
2016-09-07 15:38                 ` Thomas Gleixner
2016-08-29 10:53 ` [PATCH 5/7] nvme: switch to use pci_alloc_irq_vectors Christoph Hellwig
2016-08-29 10:53   ` Christoph Hellwig
2016-08-29 10:53 ` [PATCH 6/7] nvme: remove the post_scan callout Christoph Hellwig
2016-08-29 10:53   ` Christoph Hellwig
2016-08-29 10:53 ` [PATCH 7/7] blk-mq: get rid of the cpumask in struct blk_mq_tags Christoph Hellwig
2016-08-29 10:53   ` Christoph Hellwig
2016-08-30 23:28 ` blk-mq: allow passing in an external queue mapping V2 Keith Busch
2016-08-30 23:28   ` Keith Busch
2016-09-01  8:45   ` Christoph Hellwig
2016-09-01  8:45     ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160906173053.GC25201@localhost.localdomain \
    --to=keith.busch@intel.com \
    --cc=axboe@fb.com \
    --cc=hch@lst.de \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.