From: Chuck Lever <chuck.lever@oracle.com>
To: Christoph Hellwig <hch@infradead.org>
Cc: List Linux RDMA Mailing <linux-rdma@vger.kernel.org>,
Linux NFS Mailing List <linux-nfs@vger.kernel.org>
Subject: Re: [PATCH v2 05/13] svcrdma: Introduce local rdma_rw API helpers
Date: Thu, 30 Mar 2017 10:29:41 -0500 [thread overview]
Message-ID: <370FEB6E-1E2C-4578-91FC-A035ADF16F78@oracle.com> (raw)
In-Reply-To: <20170330123014.GA32364@infradead.org>
> On Mar 30, 2017, at 7:30 AM, Christoph Hellwig <hch@infradead.org> wrote:
>
>> + spinlock_t sc_rw_ctxt_lock;
>> + struct list_head sc_rw_ctxts;
>
> It's a little sad that we always need a list and a spinlock when
> most requests should need a single context only.
The current code needs resources protected by several
spinlocks, some of which disable bottom-halfs. This rewrite
takes it down to just this one plain vanilla spinlock which
picks up all the svcrdma layer resources needed for the I/O
at once.
There are some common cases which can require more than one
of these.
My point is, I think this is better than trips to a memory
allocator, because those frequently require at least one
BH-disabled or irqsave spinlock, which helps prevent
latency outliers and, rarely, allocation failures.
That said, I will happily consider any solution that does
not require critical sections!
>> + * Each WR chain handles a single contiguous server-side buffer,
>> + * because some registration modes (eg. FRWR) do not support a
>> + * discontiguous scatterlist.
>
> Both FRWR and FMR have no problem with a discontiguous page list,
> they only have a problem with any segment but the first not starting
> page aligned. For NFS you'll need vectored direct I/O to hit that
> case.
I'll rewrite the comment.
For the Write chunk path, each RDMA segment in the chunk
can have a different R_key. So each non-empty segment gets
its own rdma_rw chain. If the client is good, it will use
a single large segment, but not all of them do.
The Reply chunk case occurs commonly, and can require
three or more separate scatterlists, due to the alignment
constraint. Each RPC Reply resides in an xdr_buf, each of
which has up to three portions:
1. A head, which is not necessarily page-aligned,
2. A page list, which does not have to be page-aligned, and
3. A tail, which is frequently but not always in the same page
as the head (and is thus not expected to be page-aligned).
The client can provide multiple segments, each with its
own R_key. The server has to fit the RDMA Writes into both
the alignment constraints of the xdr_buf components, and
the segments provided by the client.
This is why I organized the "write the reply chunk" path
this way.
Thanks to both you and Sagi for excellent review comments.
--
Chuck Lever
next prev parent reply other threads:[~2017-03-30 15:30 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-03-27 13:47 [PATCH v2 00/13] Server-side NFS/RDMA changes proposed for v4.12 Chuck Lever
2017-03-27 13:48 ` [PATCH v2 01/13] svcrdma: Move send_wr to svc_rdma_op_ctxt Chuck Lever
2017-03-30 12:06 ` Christoph Hellwig
2017-03-27 13:48 ` [PATCH v2 02/13] svcrdma: Add svc_rdma_map_reply_hdr() Chuck Lever
2017-03-30 12:07 ` Christoph Hellwig
2017-03-27 13:48 ` [PATCH v2 03/13] svcrdma: Eliminate RPCRDMA_SQ_DEPTH_MULT Chuck Lever
2017-03-30 12:07 ` Christoph Hellwig
2017-03-27 13:48 ` [PATCH v2 04/13] svcrdma: Add helper to save pages under I/O Chuck Lever
2017-03-30 12:08 ` Christoph Hellwig
2017-03-27 13:48 ` [PATCH v2 05/13] svcrdma: Introduce local rdma_rw API helpers Chuck Lever
2017-03-30 12:30 ` Christoph Hellwig
2017-03-30 15:29 ` Chuck Lever [this message]
2017-03-27 13:48 ` [PATCH v2 06/13] svcrdma: Use rdma_rw API in RPC reply path Chuck Lever
2017-03-27 13:48 ` [PATCH v2 07/13] svcrdma: Clean up RDMA_ERROR path Chuck Lever
2017-03-27 13:48 ` [PATCH v2 08/13] svcrdma: Report Write/Reply chunk overruns Chuck Lever
2017-03-27 13:49 ` [PATCH v2 09/13] svcrdma: Clean up RPC-over-RDMA backchannel reply processing Chuck Lever
2017-03-27 13:49 ` [PATCH v2 10/13] svcrdma: Reduce size of sge array in struct svc_rdma_op_ctxt Chuck Lever
2017-03-27 13:49 ` [PATCH v2 11/13] svcrdma: Remove unused RDMA Write completion handler Chuck Lever
2017-03-27 13:49 ` [PATCH v2 12/13] svcrdma: Remove the req_map cache Chuck Lever
2017-03-27 13:49 ` [PATCH v2 13/13] svcrdma: Clean out old XDR encoders Chuck Lever
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=370FEB6E-1E2C-4578-91FC-A035ADF16F78@oracle.com \
--to=chuck.lever@oracle.com \
--cc=hch@infradead.org \
--cc=linux-nfs@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).