From: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
To: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: [PATCH v1 1/8] xprtrdma: Segment head and tail XDR buffers on page boundaries
Date: Fri, 12 Feb 2016 16:06:02 -0500 [thread overview]
Message-ID: <20160212210602.5278.57457.stgit@manet.1015granger.net> (raw)
In-Reply-To: <20160212205107.5278.55938.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>
A single memory allocation is used for the pair of buffers wherein
the RPC client builds an RPC call message and decodes its matching
reply. These buffers are sized based on the maximum possible size
of the RPC call and reply messages for the operation in progress.
This means that as the call buffer increases in size, the start of
the reply buffer is pushed farther into the memory allocation.
RPC requests are growing in size. It used to be that both the call
and reply buffers fit inside a single page.
But these days, thanks to NFSv4 (and especially security labels in
NFSv4.2) the maximum call and reply sizes are large. NFSv4.0 OPEN,
for example, now requires a 6KB allocation for a pair of call and
reply buffers, and NFSv4 LOOKUP is not far behind.
As the maximum size of a call increases, the reply buffer is pushed
far enough into the buffer's memory allocation that a page boundary
can appear in the middle of it.
When the maximum possible reply size is larger than the client's
RDMA receive buffers (currently 1KB), the client has to register a
Reply chunk for the server to RDMA Write the reply into.
The logic in rpcrdma_convert_iovs() assumes that xdr_buf head and
tail buffers would always be contained on a single page. It supplies
just one segment for the head and one for the tail.
FMR, for example, registers up to a page boundary (only a portion of
the reply buffer in the OPEN case above). But without additional
segments, it doesn't register the rest of the buffer.
When the server tries to write the OPEN reply, the RDMA Write fails
with a remote access error since the client registered only part of
the Reply chunk.
rpcrdma_convert_iovs() must split the XDR buffer into multiple
segments, each of which are guaranteed not to contain a page
boundary. That way fmr_op_map is given the proper number of segments
to register the whole reply buffer.
Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
net/sunrpc/xprtrdma/rpc_rdma.c | 42 ++++++++++++++++++++++++++++++----------
1 file changed, 32 insertions(+), 10 deletions(-)
diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
index 0f28f2d..add1f98 100644
--- a/net/sunrpc/xprtrdma/rpc_rdma.c
+++ b/net/sunrpc/xprtrdma/rpc_rdma.c
@@ -132,6 +132,33 @@ rpcrdma_tail_pullup(struct xdr_buf *buf)
return tlen;
}
+/* Split "vec" on page boundaries into segments. FMR registers pages,
+ * not a byte range. Other modes coalesce these segments into a single
+ * MR when they can.
+ */
+static int
+rpcrdma_convert_kvec(struct kvec *vec, struct rpcrdma_mr_seg *seg,
+ int n, int nsegs)
+{
+ size_t page_offset;
+ u32 remaining;
+ char *base;
+
+ base = vec->iov_base;
+ page_offset = offset_in_page(base);
+ remaining = vec->iov_len;
+ while (remaining && n < nsegs) {
+ seg[n].mr_page = NULL;
+ seg[n].mr_offset = base;
+ seg[n].mr_len = min_t(u32, PAGE_SIZE - page_offset, remaining);
+ remaining -= seg[n].mr_len;
+ base += seg[n].mr_len;
+ ++n;
+ page_offset = 0;
+ }
+ return n;
+}
+
/*
* Chunk assembly from upper layer xdr_buf.
*
@@ -150,11 +177,10 @@ rpcrdma_convert_iovs(struct xdr_buf *xdrbuf, unsigned int pos,
int page_base;
struct page **ppages;
- if (pos == 0 && xdrbuf->head[0].iov_len) {
- seg[n].mr_page = NULL;
- seg[n].mr_offset = xdrbuf->head[0].iov_base;
- seg[n].mr_len = xdrbuf->head[0].iov_len;
- ++n;
+ if (pos == 0) {
+ n = rpcrdma_convert_kvec(&xdrbuf->head[0], seg, n, nsegs);
+ if (n == nsegs)
+ return -EIO;
}
len = xdrbuf->page_len;
@@ -192,13 +218,9 @@ rpcrdma_convert_iovs(struct xdr_buf *xdrbuf, unsigned int pos,
* xdr pad bytes, saving the server an RDMA operation. */
if (xdrbuf->tail[0].iov_len < 4 && xprt_rdma_pad_optimize)
return n;
+ n = rpcrdma_convert_kvec(&xdrbuf->tail[0], seg, n, nsegs);
if (n == nsegs)
- /* Tail remains, but we're out of segments */
return -EIO;
- seg[n].mr_page = NULL;
- seg[n].mr_offset = xdrbuf->tail[0].iov_base;
- seg[n].mr_len = xdrbuf->tail[0].iov_len;
- ++n;
}
return n;
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2016-02-12 21:06 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-02-12 21:05 [PATCH v1 0/8] NFS/RDMA client patches for v4.6 Chuck Lever
[not found] ` <20160212205107.5278.55938.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2016-02-12 21:06 ` Chuck Lever [this message]
[not found] ` <20160212210602.5278.57457.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2016-02-15 14:27 ` [PATCH v1 1/8] xprtrdma: Segment head and tail XDR buffers on page boundaries Devesh Sharma
2016-02-12 21:06 ` [PATCH v1 2/8] xprtrdma: Invalidate memory when a signal is caught Chuck Lever
[not found] ` <20160212210610.5278.22489.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2016-02-15 14:28 ` Devesh Sharma
2016-02-12 21:06 ` [PATCH v1 3/8] rpcrdma: Add RPCRDMA_HDRLEN_ERR Chuck Lever
[not found] ` <20160212210618.5278.28591.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2016-02-15 14:28 ` Devesh Sharma
2016-02-12 21:06 ` [PATCH v1 4/8] xprtrdma: Properly handle RDMA_ERROR replies Chuck Lever
[not found] ` <20160212210627.5278.89517.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2016-02-15 14:28 ` Devesh Sharma
2016-02-17 21:19 ` Anna Schumaker
[not found] ` <56C4E3D6.1040605-ZwjVKphTwtPQT0dZR+AlfA@public.gmane.org>
2016-02-17 21:21 ` Chuck Lever
[not found] ` <16F01212-1045-449D-AD9E-C02F75ECE39A-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2016-02-17 21:24 ` Anna Schumaker
2016-02-12 21:06 ` [PATCH v1 5/8] xprtrdma: Serialize credit accounting again Chuck Lever
[not found] ` <20160212210635.5278.72709.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2016-02-15 14:29 ` Devesh Sharma
[not found] ` <CANjDDBgag4c2K0G+c-Mbxri+Bh2CZ6qkb49S-3myNva2J4bu_Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-02-15 15:00 ` Chuck Lever
[not found] ` <DF2B8B89-F9CA-4DF9-96BF-6E455E0E1196-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2016-02-16 5:15 ` Devesh Sharma
2016-02-12 21:06 ` [PATCH v1 6/8] xprtrdma: Use new CQ API for RPC-over-RDMA client receive CQs Chuck Lever
[not found] ` <20160212210643.5278.97996.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2016-02-15 14:29 ` Devesh Sharma
2016-02-12 21:06 ` [PATCH v1 7/8] xprtrdma: Use an anonymous union in struct rpcrdma_mw Chuck Lever
[not found] ` <20160212210651.5278.31825.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2016-02-15 14:30 ` Devesh Sharma
2016-02-12 21:07 ` [PATCH v1 8/8] xprtrdma: Use new CQ API for RPC-over-RDMA client send CQs Chuck Lever
2016-02-15 14:31 ` [PATCH v1 0/8] NFS/RDMA client patches for v4.6 Devesh Sharma
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160212210602.5278.57457.stgit@manet.1015granger.net \
--to=chuck.lever-qhclzuegtsvqt0dzr+alfa@public.gmane.org \
--cc=linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox