From: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
To: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: [PATCH v1 03/18] xprtrdma: Limit number of RDMA segments in RPC-over-RDMA headers
Date: Mon, 11 Apr 2016 16:10:33 -0400 [thread overview]
Message-ID: <20160411201033.20531.42566.stgit@manet.1015granger.net> (raw)
In-Reply-To: <20160411200323.20531.8893.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>
Send buffer space is shared between the RPC-over-RDMA header and
an RPC message. A large RPC-over-RDMA header means less space is
available for the associated RPC message, which then has to be
moved via an RDMA Read or Write.
As more segments are added to the chunk lists, the header increases
in size. Typical modern hardware needs only a few segments to
convey the maximum payload size, but some devices and registration
modes may need a lot of segments to convey data payload. Sometimes
so many are needed that the remaining space in the Send buffer is
not enough for the RPC message. Sending such a message usually
fails.
To ensure a transport can always make forward progress, cap the
number of RDMA segments that are allowed in chunk lists. This
prevents less-capable devices and memory registrations from
consuming a large portion of the Send buffer by reducing the
maximum data payload that can be conveyed with such devices.
For now I choose an arbitrary maximum of 8 RDMA segments. This
allows a maximum size RPC-over-RDMA header to fit nicely in the
current 1024 byte inline threshold with over 700 bytes remaining
for an inline RPC message.
The current maximum data payload of NFS READ or WRITE requests is
one megabyte. To convey that payload on a client with 4KB pages,
each chunk segment would need to handle 32 or more data pages. This
is well within the capabilities of FMR. For physical registration,
the maximum payload size on platforms with 4KB pages is reduced to
32KB.
For FRWR, a device's maximum page list depth would need to be at
least 34 to support the maximum 1MB payload. A device with a smaller
maximum page list depth means the maximum data payload is reduced
when using that device.
Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
net/sunrpc/xprtrdma/fmr_ops.c | 2 +-
net/sunrpc/xprtrdma/frwr_ops.c | 2 +-
net/sunrpc/xprtrdma/physical_ops.c | 2 +-
net/sunrpc/xprtrdma/verbs.c | 22 ----------------------
net/sunrpc/xprtrdma/xprt_rdma.h | 21 ++++++++++++++++++++-
5 files changed, 23 insertions(+), 26 deletions(-)
diff --git a/net/sunrpc/xprtrdma/fmr_ops.c b/net/sunrpc/xprtrdma/fmr_ops.c
index b289e10..4aeb104 100644
--- a/net/sunrpc/xprtrdma/fmr_ops.c
+++ b/net/sunrpc/xprtrdma/fmr_ops.c
@@ -48,7 +48,7 @@ static size_t
fmr_op_maxpages(struct rpcrdma_xprt *r_xprt)
{
return min_t(unsigned int, RPCRDMA_MAX_DATA_SEGS,
- rpcrdma_max_segments(r_xprt) * RPCRDMA_MAX_FMR_SGES);
+ RPCRDMA_MAX_HDR_SEGS * RPCRDMA_MAX_FMR_SGES);
}
static int
diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
index c250924..2f37598 100644
--- a/net/sunrpc/xprtrdma/frwr_ops.c
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@ -243,7 +243,7 @@ frwr_op_maxpages(struct rpcrdma_xprt *r_xprt)
struct rpcrdma_ia *ia = &r_xprt->rx_ia;
return min_t(unsigned int, RPCRDMA_MAX_DATA_SEGS,
- rpcrdma_max_segments(r_xprt) * ia->ri_max_frmr_depth);
+ RPCRDMA_MAX_HDR_SEGS * ia->ri_max_frmr_depth);
}
static void
diff --git a/net/sunrpc/xprtrdma/physical_ops.c b/net/sunrpc/xprtrdma/physical_ops.c
index 481b9b6..e16ed54 100644
--- a/net/sunrpc/xprtrdma/physical_ops.c
+++ b/net/sunrpc/xprtrdma/physical_ops.c
@@ -47,7 +47,7 @@ static size_t
physical_op_maxpages(struct rpcrdma_xprt *r_xprt)
{
return min_t(unsigned int, RPCRDMA_MAX_DATA_SEGS,
- rpcrdma_max_segments(r_xprt));
+ RPCRDMA_MAX_HDR_SEGS);
}
static int
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index f5ed9f9..9f8d6c1 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -1271,25 +1271,3 @@ out_rc:
rpcrdma_recv_buffer_put(rep);
return rc;
}
-
-/* How many chunk list items fit within our inline buffers?
- */
-unsigned int
-rpcrdma_max_segments(struct rpcrdma_xprt *r_xprt)
-{
- struct rpcrdma_create_data_internal *cdata = &r_xprt->rx_data;
- int bytes, segments;
-
- bytes = min_t(unsigned int, cdata->inline_wsize, cdata->inline_rsize);
- bytes -= RPCRDMA_HDRLEN_MIN;
- if (bytes < sizeof(struct rpcrdma_segment) * 2) {
- pr_warn("RPC: %s: inline threshold too small\n",
- __func__);
- return 0;
- }
-
- segments = 1 << (fls(bytes / sizeof(struct rpcrdma_segment)) - 1);
- dprintk("RPC: %s: max chunk list size = %d segments\n",
- __func__, segments);
- return segments;
-}
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index 7723e5f..0028748 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -144,6 +144,26 @@ rdmab_to_msg(struct rpcrdma_regbuf *rb)
#define RPCRDMA_DEF_GFP (GFP_NOIO | __GFP_NOWARN)
+/* To ensure a transport can always make forward progress,
+ * the number of RDMA segments allowed in header chunk lists
+ * is capped at 8. This prevents less-capable devices and
+ * memory registrations from overrunning the Send buffer
+ * while building chunk lists.
+ *
+ * Elements of the Read list take up more room than the
+ * Write list or Reply chunk. 8 read segments means the Read
+ * list (or Write list or Reply chunk) cannot consume more
+ * than
+ *
+ * ((8 + 2) * read segment size) + 1 XDR words, or 244 bytes.
+ *
+ * And the fixed part of the header is another 24 bytes.
+ *
+ * The smallest inline threshold is 1024 bytes, ensuring that
+ * at least 750 bytes are available for RPC messages.
+ */
+#define RPCRDMA_MAX_HDR_SEGS (8)
+
/*
* struct rpcrdma_rep -- this structure encapsulates state required to recv
* and complete a reply, asychronously. It needs several pieces of
@@ -456,7 +476,6 @@ struct rpcrdma_regbuf *rpcrdma_alloc_regbuf(struct rpcrdma_ia *,
void rpcrdma_free_regbuf(struct rpcrdma_ia *,
struct rpcrdma_regbuf *);
-unsigned int rpcrdma_max_segments(struct rpcrdma_xprt *);
int rpcrdma_ep_post_extra_recv(struct rpcrdma_xprt *, unsigned int);
int frwr_alloc_recovery_wq(void);
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2016-04-11 20:10 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-04-11 20:10 [PATCH v1 00/18] NFS/RDMA client patches for 4.7 Chuck Lever
[not found] ` <20160411200323.20531.8893.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2016-04-11 20:10 ` [PATCH v1 01/18] sunrpc: Advertise maximum backchannel payload size Chuck Lever
2016-04-11 20:10 ` [PATCH v1 02/18] xprtrdma: Bound the inline threshold values Chuck Lever
[not found] ` <20160411201024.20531.77252.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2016-04-12 18:04 ` Anna Schumaker
[not found] ` <570D38B6.30005-ZwjVKphTwtPQT0dZR+AlfA@public.gmane.org>
2016-04-12 19:12 ` Chuck Lever
2016-04-11 20:10 ` Chuck Lever [this message]
2016-04-11 20:10 ` [PATCH v1 04/18] xprtrdma: Prevent inline overflow Chuck Lever
2016-04-11 20:10 ` [PATCH v1 05/18] xprtrdma: Avoid using Write list for small NFS READ requests Chuck Lever
[not found] ` <20160411201050.20531.53651.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2016-04-11 20:35 ` Steve Wise
2016-04-11 20:38 ` Chuck Lever
[not found] ` <65CBC59F-3005-44FE-8C70-9DDBC8507C9E-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2016-04-12 14:15 ` Christoph Hellwig
[not found] ` <20160412141533.GA16218-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2016-04-12 14:49 ` Chuck Lever
[not found] ` <06326E24-7170-4D09-A841-08ED31D143FF-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2016-04-12 17:01 ` Christoph Hellwig
[not found] ` <20160412170121.GA2052-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2016-04-12 18:08 ` Chuck Lever
2016-04-11 20:10 ` [PATCH v1 06/18] xprtrdma: Update comments in rpcrdma_marshal_req() Chuck Lever
2016-04-11 20:11 ` [PATCH v1 07/18] xprtrdma: Allow Read list and Reply chunk simultaneously Chuck Lever
2016-04-11 20:11 ` [PATCH v1 08/18] xprtrdma: Remove rpcrdma_create_chunks() Chuck Lever
2016-04-11 20:11 ` [PATCH v1 09/18] xprtrdma: Use core ib_drain_qp() API Chuck Lever
[not found] ` <20160411201123.20531.75329.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2016-04-12 4:49 ` Leon Romanovsky
2016-04-11 20:11 ` [PATCH v1 10/18] xprtrdma: Rename rpcrdma_frwr::sg and sg_nents Chuck Lever
2016-04-11 20:11 ` [PATCH v1 11/18] xprtrdma: Save I/O direction in struct rpcrdma_frwr Chuck Lever
2016-04-11 20:11 ` [PATCH v1 12/18] xprtrdma: Reset MRs in frwr_op_unmap_sync() Chuck Lever
2016-04-11 20:11 ` [PATCH v1 00/18] NFS/RDMA client patches for 4.7 Chuck Lever
2016-04-11 20:11 ` [PATCH v1 13/18] xprtrdma: Refactor the FRWR recovery worker Chuck Lever
2016-04-11 20:12 ` [PATCH v1 14/18] xprtrdma: Move fr_xprt and fr_worker to struct rpcrdma_mw Chuck Lever
2016-04-11 20:12 ` [PATCH v1 15/18] xprtrdma: Refactor __fmr_dma_unmap() Chuck Lever
2016-04-11 20:12 ` [PATCH v1 16/18] xprtrdma: Add ro_unmap_safe memreg method Chuck Lever
2016-04-11 20:12 ` [PATCH v1 17/18] xprtrdma: Remove ro_unmap() from all registration modes Chuck Lever
2016-04-11 20:12 ` [PATCH v1 18/18] xprtrdma: Faster server reboot recovery Chuck Lever
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160411201033.20531.42566.stgit@manet.1015granger.net \
--to=chuck.lever-qhclzuegtsvqt0dzr+alfa@public.gmane.org \
--cc=linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox