[PATCH v4 15/24] xprtrdma: Reduce the number of hardway buffer allocations

linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Chuck Lever <chuck.lever@oracle.com>
To: linux-nfs@vger.kernel.org, linux-rdma@vger.kernel.org
Subject: [PATCH v4 15/24] xprtrdma: Reduce the number of hardway buffer allocations
Date: Wed, 21 May 2014 20:56:25 -0400	[thread overview]
Message-ID: <20140522005625.27190.64480.stgit@manet.1015granger.net> (raw)
In-Reply-To: <20140522004505.27190.58897.stgit@manet.1015granger.net>

While marshaling an RPC/RDMA request, the inline_{rsize,wsize}
settings determine whether an inline request is used, or whether
read or write chunks lists are built. The current default value of
these settings is 1024. Any RPC request smaller than 1024 bytes is
sent to the NFS server completely inline.

rpcrdma_buffer_create() allocates and pre-registers a set of RPC
buffers for each transport instance, also based on the inline rsize
and wsize settings.

RPC/RDMA requests and replies are built in these buffers. However,
if an RPC/RDMA request is expected to be larger than 1024, a buffer
has to be allocated and registered for that RPC, and deregistered
and released when the RPC is complete. This is known has a
"hardway allocation."

Since the introduction of NFSv4, the size of RPC requests has become
larger, and hardway allocations are thus more frequent. Hardway
allocations are significant overhead, and they waste the existing
RPC buffers pre-allocated by rpcrdma_buffer_create().

We'd like fewer hardway allocations.

Increasing the size of the pre-registered buffers is the most direct
way to do this. However, a blanket increase of the inline thresholds
has interoperability consequences.

On my 64-bit system, rpcrdma_buffer_create() requests roughly 7000
bytes for each RPC request buffer, using kmalloc(). Due to internal
fragmentation, this wastes nearly 1200 bytes because kmalloc()
already returns an 8192-byte piece of memory for a 7000-byte
allocation request, though the extra space remains unused.

So let's round up the size of the pre-allocated buffers, and make
use of the unused space in the kmalloc'd memory.

This change reduces the amount of hardway allocated memory for an
NFSv4 general connectathon run from 1322092 to 9472 bytes (99%).

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
---

 net/sunrpc/xprtrdma/verbs.c |   25 +++++++++++++------------
 1 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 1d08366..c80995a 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -50,6 +50,7 @@
 #include <linux/interrupt.h>
 #include <linux/pci.h>	/* for Tavor hack below */
 #include <linux/slab.h>
+#include <asm/bitops.h>

 #include "xprt_rdma.h"

@@ -1005,7 +1006,7 @@ rpcrdma_buffer_create(struct rpcrdma_buffer *buf, struct rpcrdma_ep *ep,
 	struct rpcrdma_ia *ia, struct rpcrdma_create_data_internal *cdata)
 {
 	char *p;
-	size_t len;
+	size_t len, rlen, wlen;
 	int i, rc;
 	struct rpcrdma_mw *r;

@@ -1120,16 +1121,16 @@ rpcrdma_buffer_create(struct rpcrdma_buffer *buf, struct rpcrdma_ep *ep,
 	 * Allocate/init the request/reply buffers. Doing this
 	 * using kmalloc for now -- one for each buf.
 	 */
+	wlen = 1 << fls(cdata->inline_wsize + sizeof(struct rpcrdma_req));
+	rlen = 1 << fls(cdata->inline_rsize + sizeof(struct rpcrdma_rep));
+	dprintk("RPC:       %s: wlen = %zu, rlen = %zu\n",
+		__func__, wlen, rlen);
+
 	for (i = 0; i < buf->rb_max_requests; i++) {
 		struct rpcrdma_req *req;
 		struct rpcrdma_rep *rep;

-		len = cdata->inline_wsize + sizeof(struct rpcrdma_req);
-		/* RPC layer requests *double* size + 1K RPC_SLACK_SPACE! */
-		/* Typical ~2400b, so rounding up saves work later */
-		if (len < 4096)
-			len = 4096;
-		req = kmalloc(len, GFP_KERNEL);
+		req = kmalloc(wlen, GFP_KERNEL);
 		if (req == NULL) {
 			dprintk("RPC:       %s: request buffer %d alloc"
 				" failed\n", __func__, i);
@@ -1141,16 +1142,16 @@ rpcrdma_buffer_create(struct rpcrdma_buffer *buf, struct rpcrdma_ep *ep,
 		buf->rb_send_bufs[i]->rl_buffer = buf;

 		rc = rpcrdma_register_internal(ia, req->rl_base,
-				len - offsetof(struct rpcrdma_req, rl_base),
+				wlen - offsetof(struct rpcrdma_req, rl_base),
 				&buf->rb_send_bufs[i]->rl_handle,
 				&buf->rb_send_bufs[i]->rl_iov);
 		if (rc)
 			goto out;

-		buf->rb_send_bufs[i]->rl_size = len-sizeof(struct rpcrdma_req);
+		buf->rb_send_bufs[i]->rl_size = wlen -
+						sizeof(struct rpcrdma_req);

-		len = cdata->inline_rsize + sizeof(struct rpcrdma_rep);
-		rep = kmalloc(len, GFP_KERNEL);
+		rep = kmalloc(rlen, GFP_KERNEL);
 		if (rep == NULL) {
 			dprintk("RPC:       %s: reply buffer %d alloc failed\n",
 				__func__, i);
@@ -1162,7 +1163,7 @@ rpcrdma_buffer_create(struct rpcrdma_buffer *buf, struct rpcrdma_ep *ep,
 		buf->rb_recv_bufs[i]->rr_buffer = buf;

 		rc = rpcrdma_register_internal(ia, rep->rr_base,
-				len - offsetof(struct rpcrdma_rep, rr_base),
+				rlen - offsetof(struct rpcrdma_rep, rr_base),
 				&buf->rb_recv_bufs[i]->rr_handle,
 				&buf->rb_recv_bufs[i]->rr_iov);
 		if (rc)

next prev parent reply	other threads:[~2014-05-22  0:56 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-22  0:54 [PATCH v4 00/24] NFS/RDMA client patches for next merge Chuck Lever
2014-05-22  0:54 ` [PATCH v4 01/24] xprtrdma: mind the device's max fast register page list depth Chuck Lever
2014-05-22  0:54 ` [PATCH v4 02/24] nfs-rdma: Fix for FMR leaks Chuck Lever
2014-05-22  0:54 ` [PATCH v4 03/24] xprtrdma: RPC/RDMA must invoke xprt_wake_pending_tasks() in process context Chuck Lever
2014-05-22  0:54 ` [PATCH v4 04/24] xprtrdma: Remove BOUNCEBUFFERS memory registration mode Chuck Lever
2014-05-22  0:54 ` [PATCH v4 05/24] xprtrdma: Remove MEMWINDOWS registration modes Chuck Lever
2014-05-22  0:55 ` [PATCH v4 06/24] xprtrdma: Remove REGISTER memory registration mode Chuck Lever
2014-05-22  0:55 ` [PATCH v4 07/24] xprtrdma: Fall back to MTHCAFMR when FRMR is not supported Chuck Lever
2014-05-22  0:55 ` [PATCH v4 08/24] xprtrdma: mount reports "Invalid mount option" if memreg mode " Chuck Lever
2014-05-22  0:55 ` [PATCH v4 09/24] xprtrdma: Simplify rpcrdma_deregister_external() synopsis Chuck Lever
2014-05-22  0:55 ` [PATCH v4 10/24] xprtrdma: Make rpcrdma_ep_destroy() return void Chuck Lever
2014-05-22  0:55 ` [PATCH v4 11/24] xprtrdma: Split the completion queue Chuck Lever
2014-05-22  0:55 ` [PATCH v4 12/24] xprtrmda: Reduce lock contention in completion handlers Chuck Lever
2014-05-22  0:56 ` [PATCH v4 13/24] xprtrmda: Reduce calls to ib_poll_cq() " Chuck Lever
2014-05-22  0:56 ` [PATCH v4 14/24] xprtrdma: Limit work done by completion handler Chuck Lever
2014-05-22  0:56 ` Chuck Lever [this message]
2014-05-22  0:56 ` [PATCH v4 16/24] xprtrdma: Ensure ia->ri_id->qp is not NULL when reconnecting Chuck Lever
2014-05-22  0:56 ` [PATCH v4 17/24] xprtrdma: Remove Tavor MTU setting Chuck Lever
2014-05-22  0:56 ` [PATCH v4 18/24] xprtrdma: Allocate missing pagelist Chuck Lever
2014-05-22  0:56 ` [PATCH v4 19/24] xprtrdma: Use macros for reconnection timeout constants Chuck Lever
2014-05-22  0:57 ` [PATCH v4 20/24] xprtrdma: Reset connection timeout after successful reconnect Chuck Lever
2014-05-22  2:07   ` Trond Myklebust
2014-05-22  3:28     ` Chuck Lever
2014-05-22  0:57 ` [PATCH v4 21/24] SUNRPC: Move congestion window contants to header file Chuck Lever
2014-05-22  0:57 ` [PATCH v4 22/24] xprtrdma: Avoid deadlock when credit window is reset Chuck Lever
2014-05-22  0:57 ` [PATCH v4 23/24] xprtrdma: Remove BUG_ON() call sites Chuck Lever
2014-05-22  0:57 ` [PATCH v4 24/24] xprtrdma: Disconnect on registration failure Chuck Lever

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:1d08366 dfblob:c80995a )
 OR (
bs:"[PATCH v4 15/24] xprtrdma: Reduce the number of hardway buffer allocations" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140522005625.27190.64480.stgit@manet.1015granger.net \
    --to=chuck.lever@oracle.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).