[PATCHv3 0/3] nvme: NUMA locality for fabrics

linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

From: hch@lst.de (Christoph Hellwig)
Subject: [PATCHv3 0/3] nvme: NUMA locality for fabrics
Date: Wed, 21 Nov 2018 09:36:20 +0100	[thread overview]
Message-ID: <20181121083620.GA29382@lst.de> (raw)
In-Reply-To: <fd5ca98e-d0ed-5551-39d1-487bc6a7760f@broadcom.com>

On Tue, Nov 20, 2018@11:27:04AM -0800, James Smart wrote:
> - What are the latencies that are meaningful ?? is it solely single digit 
> us ? 10-30us ?? 1ms ?? 50ms ?? 100ms ???? How do things change if the 
> latency change ?

Well, we did spend a lot of effort on making sure sub-10us latencies
work.  For real life setups with multiple cpus in use the 100us order
of magniture is probably more interesting.

> - At what point does it become more important to get commands to the 
> subsystem (via a different queue or queues on different controllers) so 
> they can be being worked on in parallel vs the latency of a single io ??? 
> How is this need communicated to the nvme layer so we can make the right 
> choices ??? Queue counts, queue size, and MAXCMD limits (thanks Keith), 
> may cause throttles that increase this need.?? To that end - what are 
> your expectations for queue size or MAXCMD limits vs the latencies vs load 
> from a single cpu?? or a set of cpus ?

If you fully load the queue you are of course going to see massive
additional latency.  That's why you'll see typical NVMe PCIe device
(or RDMA setups) massively overprovisioned in number of queues and/or
queue depth.

> - Must every cpu get a queue ??? what if the controller won't support a 
> queue per cpu ?? how do things change if only 1 in 16 or less of the cpu's 
> get queues ?? Today, if less than cpu count queues aren't supported - 
> aren't queues for the different controllers likely mapping to the same cpus 
> ? I don't think there's awareness of what queues on other controllers are 
> present so that there could be redundant paths mapped to cpus that aren't 
> bound already. And if such logic were added, how does that affect the 
> multipathing choices ?

Less than cpu count queues are perfectly supported, we'll just start
sharing queues.  In general as long as you share queues between cores
on the same socket things still work reasonably fine.  Once you start
sharing a queue between sockets you are going to be in a world of pain.

> - What if application load is driven only by specific cpu's - if a "cpu was 
> given" to the specific task (constrained app, VMs, or containers. how would 
> we know that?) how does that map if multiple cpus are sharing a queue ? 
> will multipathing and load choices be made system-wide, specific to a cpu 
> set, or a single cpu ?

Right now we do multipathing devisions per node (aka per socket for
todays typical systems).

> - What if a cpu is finally out of cycles due to load, how can we find out 
> if the io must be limited to a cpu or whether affinity can be subverted so 
> that use of other idle cpus can share in the processing load for the 
> queue(s) for that cpu ?

For the traditional interrupt driven model we really need to process
the interrupts on the submitting cpu to throttle.  For an interesting
model to move the I/O completion load to another thread and thus potential
cpu look at the aio poll patches Jens just posted.

> If we're completely focused on cpu affinity with a queue, what happens when 
> not all queues are equal. There is lots of talk of specific queues 
> providing specific functionality that other queues wouldn't support. What 
> if queues 1-M do writes with dedup, queues N-P do writes with compression, 
> and Q-Z do accelerated writes to a local cache.? How do you see the 
> current linux implementation migrating to something like that ?

Very unlikely.  We have support for a few queu types now in the for-4.21
block tree (default, reads, polling), but the above just sounds way too
magic.

next prev parent reply	other threads:[~2018-11-21  8:36 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-02  9:56 [PATCHv3 0/3] nvme: NUMA locality for fabrics Hannes Reinecke
2018-11-02  9:56 ` [PATCH 1/3] nvme: NUMA locality information " Hannes Reinecke
2018-11-08  9:22   ` Christoph Hellwig
2018-11-08  9:35     ` Hannes Reinecke
2018-11-02  9:56 ` [PATCH 2/3] nvme-multipath: Select paths based on NUMA locality Hannes Reinecke
2018-11-08  9:32   ` Christoph Hellwig
2018-11-02  9:56 ` [PATCH 3/3] nvme-multipath: automatic NUMA path balancing Hannes Reinecke
2018-11-08  9:36   ` Christoph Hellwig
2018-11-16  8:12 ` [PATCHv3 0/3] nvme: NUMA locality for fabrics Christoph Hellwig
2018-11-16  8:21   ` Hannes Reinecke
2018-11-16  8:23     ` Christoph Hellwig
2018-11-19 22:31       ` Sagi Grimberg
2018-11-20  6:12         ` Hannes Reinecke
2018-11-20  9:41           ` Christoph Hellwig
2018-11-20 15:47             ` Keith Busch
2018-11-20 19:27               ` James Smart
2018-11-21  8:36                 ` Christoph Hellwig [this message]
2018-11-20 16:21             ` Hannes Reinecke
2018-11-20 18:12             ` James Smart

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181121083620.GA29382@lst.de \
    --to=hch@lst.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).