Re: [SPDK] NVMe RDMA SGL Support

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Walker, Benjamin <benjamin.walker at intel.com>
To: spdk@lists.01.org
Subject: Re: [SPDK] NVMe RDMA SGL Support
Date: Thu, 03 May 2018 19:43:06 +0000	[thread overview]
Message-ID: <1525376584.22849.67.camel@intel.com> (raw)
In-Reply-To: CA++50Vcqd=efDuKvOhe9oG2jDB6k8f7GSbWsov4WgGG7uRDPjw@mail.gmail.com

[-- Attachment #1: Type: text/plain, Size: 3069 bytes --]

On Thu, 2018-05-03 at 19:11 +0000, Mikhail altman wrote:
> Hello Everyone,
> 
> On SPDK v18.01, we noticed there's a TODO in nvme_rdma_build_sgl_request() in
> nvme_rdma.c.
> 
> Some code for context:
> 
>     /* TODO: for now, we only support a single SGL entry */
>     rc = req->payload.u.sgl.next_sge_fn(req->payload.u.sgl.cb_arg, &virt_addr,
> &length);
>     if (rc) {
>             return -1;
>     }
> 
>     if (length < req->payload_size) {
>             SPDK_ERRLOG("multi-element SGL currently not supported for
> RDMA\n");
>             return -1;
>     }
> 
> Is there any ongoing discussion or work to implement support for multiple SGL
> entries? (I looked at the Trello board and GerritHub, but couldn't find
> anything related.) If not, we can look into making a patch for this on our
> end. Any thoughts about what this would entail are welcome!

Hi Mike,

John has been working in this area. It's great to see that he'll have patches to
take a look at shortly. I just wanted to clarify a few things.

This isn't much of a limitation for the use cases we support today. The
initiator buffers can be scattered already, it's just the target memory for a
single I/O that must be described by a single element. Since the RDMA NIC is
pulling the data over the network and placing it into the local target system's
memory, it is simple enough to have it simultaneously gather it into a single
contiguous memory region.

That said, I can see at least a few use cases for this. One would be to change
the way the memory pool is allocated in the NVMe-oF target. Today, it allocates
4 full queue depths worth of max I/O size buffers in a shared pool for all
connections to use. If we had full support for scatter gather lists, we could
change this pool to contain an equivalent amount of 4k buffers. Then each I/O
could pull a list of buffers instead of a single big one and we'd end up with
better memory utilization. We already have the required scatter-gather-aware
APIs through the rest of the stack to make this happen.

The other use case is one where we switch our model to use memory provided by
the backing bdev for the RDMA transfer instead of using a separate dedicated
pool allocated by the NVMe-oF target. That backing bdev may need to provide the
memory as a scatter gather list for various reasons (this is John's use case).
This is the long term direction for the NVMe-oF target.

In addition to enabling custom bdevs to provide scatter gather lists for
whatever reason, this would also enable things like zero-copy transfers directly
to persistent memory or to a local NVMe SSD's controller memory buffer. This
effectively eliminates the single bounce we do from RDMA NIC to host memory to
persistent storage device, and would probably shave an additional ~3
microseconds off of the round trip latency for these cases.

These are all cool projects that are worthy of time and effort. If you all are
willing to work in this area, please jump in!

> 
> Thanks in advance,
> Mike

next             reply	other threads:[~2018-05-03 19:43 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-03 19:43 Walker, Benjamin [this message]
  -- strict thread matches above, loose matches on Subject: below --
2018-05-03 20:47 [SPDK] NVMe RDMA SGL Support Mikhail altman
2018-05-03 19:34 Meneghini, John
2018-05-03 19:11 Mikhail altman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1525376584.22849.67.camel@intel.com \
    --to=spdk@lists.01.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.