From: Anna Schumaker <Anna.Schumaker@netapp.com>
To: Chuck Lever <chuck.lever@oracle.com>, <linux-nfs@vger.kernel.org>
Subject: Re: [PATCH v1 03/16] SUNRPC: Pass callsize and recvsize to buf_alloc as separate arguments
Date: Mon, 20 Oct 2014 10:04:45 -0400 [thread overview]
Message-ID: <5445167D.9050401@Netapp.com> (raw)
In-Reply-To: <20141016193838.13414.12691.stgit@manet.1015granger.net>
On 10/16/14 15:38, Chuck Lever wrote:
> I noticed that on RDMA, NFSv4 operations were using "hardway"
> allocations much more than not. A "hardway" allocation uses GFP_NOFS
> during each RPC to allocate the XDR buffer, instead of using a
> pre-allocated pre-registered buffer for each RPC.
>
> The pre-allocated buffers are 2200 bytes in length. The requested
> XDR buffer sizes looked like this:
>
> GETATTR: 3220 bytes
> LOOKUP: 3612 bytes
> WRITE: 3256 bytes
> OPEN: 6344 bytes
>
> But an NFSv4 GETATTR RPC request should be small. It's the reply
> part of GETATTR that can grow large.
>
> call_allocate() passes a single value as the XDR buffer size: the
> sum of call and reply buffers. However, the xprtrdma transport
> allocates its XDR request and reply buffers separately.
>
> xprtrdma needs to know the maximum call size, as guidance for how
> large the outgoing request is going to be and how the NFS payload
> will be marshalled into chunks.
>
> But RDMA XDR reply buffers are pre-posted, fixed-size buffers, not
> allocated by xprt_rdma_allocate().
>
> Because of the sum passed through ->buf_alloc(), xprtrdma's
> ->buf_alloc() always allocates more XDR buffer than it will ever
> use. For NFSv4, it is unnecessarily triggering the slow "hardway"
> path for almost every RPC.
>
> Pass the call and reply buffer size values separately to the
> transport's ->buf_alloc method. The RDMA transport ->buf_alloc can
> now ignore the reply size, and allocate just what it will use for
> the call buffer. The socket transport ->buf_alloc can simply add
> them together, as call_allocate() did before.
>
> With this patch, an NFSv4 GETATTR request now allocates a 476 byte
> RDMA XDR buffer. I didn't see a single NFSv4 request that did not
> fit into the transport's pre-allocated XDR buffer.
>
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
> include/linux/sunrpc/sched.h | 2 +-
> include/linux/sunrpc/xprt.h | 3 ++-
> net/sunrpc/clnt.c | 4 ++--
> net/sunrpc/sched.c | 6 ++++--
> net/sunrpc/xprtrdma/transport.c | 2 +-
> net/sunrpc/xprtsock.c | 3 ++-
> 6 files changed, 12 insertions(+), 8 deletions(-)
>
> diff --git a/include/linux/sunrpc/sched.h b/include/linux/sunrpc/sched.h
> index 1a89599..68fa71d 100644
> --- a/include/linux/sunrpc/sched.h
> +++ b/include/linux/sunrpc/sched.h
> @@ -232,7 +232,7 @@ struct rpc_task *rpc_wake_up_first(struct rpc_wait_queue *,
> void *);
> void rpc_wake_up_status(struct rpc_wait_queue *, int);
> void rpc_delay(struct rpc_task *, unsigned long);
> -void * rpc_malloc(struct rpc_task *, size_t);
> +void *rpc_malloc(struct rpc_task *, size_t, size_t);
> void rpc_free(void *);
> int rpciod_up(void);
> void rpciod_down(void);
> diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h
> index fcbfe87..632685c 100644
> --- a/include/linux/sunrpc/xprt.h
> +++ b/include/linux/sunrpc/xprt.h
> @@ -124,7 +124,8 @@ struct rpc_xprt_ops {
> void (*rpcbind)(struct rpc_task *task);
> void (*set_port)(struct rpc_xprt *xprt, unsigned short port);
> void (*connect)(struct rpc_xprt *xprt, struct rpc_task *task);
> - void * (*buf_alloc)(struct rpc_task *task, size_t size);
> + void * (*buf_alloc)(struct rpc_task *task,
> + size_t call, size_t reply);
> void (*buf_free)(void *buffer);
> int (*send_request)(struct rpc_task *task);
> void (*set_retrans_timeout)(struct rpc_task *task);
> diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
> index 488ddee..5e817d6 100644
> --- a/net/sunrpc/clnt.c
> +++ b/net/sunrpc/clnt.c
> @@ -1599,8 +1599,8 @@ call_allocate(struct rpc_task *task)
> req->rq_rcvsize = RPC_REPHDRSIZE + slack + proc->p_replen;
> req->rq_rcvsize <<= 2;
>
> - req->rq_buffer = xprt->ops->buf_alloc(task,
> - req->rq_callsize + req->rq_rcvsize);
> + req->rq_buffer = xprt->ops->buf_alloc(task, req->rq_callsize,
> + req->rq_rcvsize);
> if (req->rq_buffer != NULL)
> return;
>
> diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
> index 9358c79..fc4f939 100644
> --- a/net/sunrpc/sched.c
> +++ b/net/sunrpc/sched.c
> @@ -829,7 +829,8 @@ static void rpc_async_schedule(struct work_struct *work)
> /**
> * rpc_malloc - allocate an RPC buffer
> * @task: RPC task that will use this buffer
> - * @size: requested byte size
> + * @call: maximum size of on-the-wire RPC call, in bytes
> + * @reply: maximum size of on-the-wire RPC reply, in bytes
> *
> * To prevent rpciod from hanging, this allocator never sleeps,
> * returning NULL and suppressing warning if the request cannot be serviced
> @@ -843,8 +844,9 @@ static void rpc_async_schedule(struct work_struct *work)
> * In order to avoid memory starvation triggering more writebacks of
> * NFS requests, we avoid using GFP_KERNEL.
> */
> -void *rpc_malloc(struct rpc_task *task, size_t size)
> +void *rpc_malloc(struct rpc_task *task, size_t call, size_t reply)
> {
> + size_t size = call + reply;
> struct rpc_buffer *buf;
> gfp_t gfp = GFP_NOWAIT | __GFP_NOWARN;
>
> diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
> index 2faac49..6e9d0a7 100644
> --- a/net/sunrpc/xprtrdma/transport.c
> +++ b/net/sunrpc/xprtrdma/transport.c
> @@ -459,7 +459,7 @@ xprt_rdma_connect(struct rpc_xprt *xprt, struct rpc_task *task)
> * the receive buffer portion when using reply chunks.
> */
> static void *
> -xprt_rdma_allocate(struct rpc_task *task, size_t size)
> +xprt_rdma_allocate(struct rpc_task *task, size_t size, size_t replen)
The comment right before this function mentions that send and receive buffers are allocated in the same call. Can you update this?
Anna
> {
> struct rpc_xprt *xprt = task->tk_rqstp->rq_xprt;
> struct rpcrdma_req *req, *nreq;
> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
> index 43cd89e..b4aca48 100644
> --- a/net/sunrpc/xprtsock.c
> +++ b/net/sunrpc/xprtsock.c
> @@ -2423,8 +2423,9 @@ static void xs_tcp_print_stats(struct rpc_xprt *xprt, struct seq_file *seq)
> * we allocate pages instead doing a kmalloc like rpc_malloc is because we want
> * to use the server side send routines.
> */
> -static void *bc_malloc(struct rpc_task *task, size_t size)
> +static void *bc_malloc(struct rpc_task *task, size_t call, size_t reply)
> {
> + size_t size = call + reply;
> struct page *page;
> struct rpc_buffer *buf;
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2014-10-20 14:04 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-10-16 19:38 [PATCH v1 00/16] NFS/RDMA patches for 3.19 Chuck Lever
2014-10-16 19:38 ` [PATCH v1 01/16] xprtrdma: Return an errno from rpcrdma_register_external() Chuck Lever
2014-10-16 19:38 ` [PATCH v1 02/16] xprtrdma: Cap req_cqinit Chuck Lever
2014-10-20 13:27 ` Anna Schumaker
2014-10-16 19:38 ` [PATCH v1 03/16] SUNRPC: Pass callsize and recvsize to buf_alloc as separate arguments Chuck Lever
2014-10-20 14:04 ` Anna Schumaker [this message]
2014-10-20 18:21 ` Chuck Lever
2014-10-16 19:38 ` [PATCH v1 04/16] xprtrdma: Re-write rpcrdma_flush_cqs() Chuck Lever
2014-10-16 19:38 ` [PATCH v1 05/16] xprtrdma: unmap all FMRs during transport disconnect Chuck Lever
2014-10-16 19:39 ` [PATCH v1 06/16] xprtrdma: spin CQ completion vectors Chuck Lever
2014-10-16 19:39 ` [PATCH v1 07/16] SUNRPC: serialize iostats updates Chuck Lever
2014-10-16 19:39 ` [PATCH v1 08/16] xprtrdma: Display async errors Chuck Lever
2014-10-16 19:39 ` [PATCH v1 09/16] xprtrdma: Enable pad optimization Chuck Lever
2014-10-16 19:39 ` [PATCH v1 10/16] NFS: Include transport protocol name in UCS client string Chuck Lever
2014-10-16 19:39 ` [PATCH v1 11/16] NFS: Clean up nfs4_init_callback() Chuck Lever
2014-10-16 19:39 ` [PATCH v1 12/16] SUNRPC: Add rpc_xprt_is_bidirectional() Chuck Lever
2014-10-16 19:40 ` [PATCH v1 13/16] NFS: Add sidecar RPC client support Chuck Lever
2014-10-20 17:33 ` Anna Schumaker
2014-10-20 18:09 ` Chuck Lever
2014-10-20 19:40 ` Trond Myklebust
2014-10-20 20:11 ` Chuck Lever
2014-10-20 22:31 ` Trond Myklebust
2014-10-21 1:06 ` Chuck Lever
2014-10-21 7:45 ` Trond Myklebust
2014-10-21 17:11 ` Chuck Lever
2014-10-22 8:39 ` Trond Myklebust
2014-10-22 17:20 ` Chuck Lever
2014-10-22 20:53 ` Trond Myklebust
2014-10-22 22:38 ` Chuck Lever
2014-10-23 13:32 ` J. Bruce Fields
2014-10-23 13:55 ` Chuck Lever
2014-10-16 19:40 ` [PATCH v1 14/16] NFS: Set BIND_CONN_TO_SESSION arguments in the proc layer Chuck Lever
2014-10-16 19:40 ` [PATCH v1 15/16] NFS: Bind side-car connection to session Chuck Lever
2014-10-16 19:40 ` [PATCH v1 16/16] NFS: Disable SESSION4_BACK_CHAN when a backchannel sidecar is to be used Chuck Lever
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5445167D.9050401@Netapp.com \
--to=anna.schumaker@netapp.com \
--cc=chuck.lever@oracle.com \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox