[PATCH] NVMe: SQ/CQ NUMA locality

linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

From: willy@linux.intel.com (Matthew Wilcox)
Subject: [PATCH] NVMe: SQ/CQ NUMA locality
Date: Wed, 13 Mar 2013 02:48:47 -0400	[thread overview]
Message-ID: <20130313064847.GH4530@linux.intel.com> (raw)
In-Reply-To: <1359422441-26433-1-git-send-email-keith.busch@intel.com>

On Mon, Jan 28, 2013@06:20:41PM -0700, Keith Busch wrote:
> This is related to an item off the "TODO" list that suggests experimenting
> with NUMA locality. There is no dma alloc routine that takes a NUMA node id, so
> the allocations are done a bit different. I am not sure if this is the correct
> way to use dma_map/umap_single, but it seems to work fine. 

Ah ... works fine on Intel architectures ... not so fine on
other architectures.  We'd have to add in explicit calls to
dma_sync_single_for_cpu() and dma_sync_single_for_device(), and that's
just not going to be efficient.

> I tested this on an Intel SC2600C0 server with two E5-2600 Xeons (32 total
> cpu threads) with all memory sockets fully populated and giving two NUMA
> domains.  The only NVMe device I can test with is a pre-alpha level with an
> FPGA, so it doesn't run as fast as it could, but I could still measure a
> small difference using fio, though not a very significant difference.
> 
> With NUMA:
> 
>    READ: io=65534MB, aggrb=262669KB/s, minb=8203KB/s, maxb=13821KB/s, mint=152006msec, maxt=255482msec
>   WRITE: io=65538MB, aggrb=262681KB/s, minb=8213KB/s, maxb=13792KB/s, mint=152006msec, maxt=255482msec
> 
> Without NUMA:
> 
>    READ: io=65535MB, aggrb=257995KB/s, minb=8014KB/s, maxb=13217KB/s, mint=159122msec, maxt=264339msec
>   WRITE: io=65537MB, aggrb=258001KB/s, minb=8035KB/s, maxb=13198KB/s, mint=159122msec, maxt=264339msec

I think we can get in trouble for posting raw numbers ... so let's
pretend you simply said "About a 2% performance improvement".  Now, OK,
that doesn't sound like much, but that's significant enough to make this
worth pursuing.

So ... I think we need to add a dma_alloc_attrs_node() or something,
and pass the nid all the way down to the ->alloc routine.

Another thing I'd like you to try is allocating *only* the completion
queue local to the node.  ie allocate the submission queue on the node
local to the device and the completion queue on the node local to the
CPU that is using it.

My reason for thinking this is a good idea is the assumption that
cross-node writes are cheaper than reads.  So having the CPU write to
remote memory, the device read from local memory, then the device write
to remote memory and the CPU read from local memory should work out
better than either allocating both the submission & completion queues
local to the CPU or local to the device.

I think that dma_alloc_coherent currently allocates memory local to the
device, so all you need to do to test this theory is revert the half of
your patch which allocates the submission queue local to the CPU.

Thanks for trying this out!

     prev parent reply	other threads:[~2013-03-13  6:48 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-29  1:20 [PATCH] NVMe: SQ/CQ NUMA locality Keith Busch
2013-03-13  6:48 ` Matthew Wilcox [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130313064847.GH4530@linux.intel.com \
    --to=willy@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).