public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Anna Schumaker <Anna.Schumaker@netapp.com>
To: Chuck Lever <chuck.lever@oracle.com>, <linux-nfs@vger.kernel.org>
Subject: Re: [PATCH v1 03/16] SUNRPC: Pass callsize and recvsize to buf_alloc as separate arguments
Date: Mon, 20 Oct 2014 10:04:45 -0400	[thread overview]
Message-ID: <5445167D.9050401@Netapp.com> (raw)
In-Reply-To: <20141016193838.13414.12691.stgit@manet.1015granger.net>

On 10/16/14 15:38, Chuck Lever wrote:
> I noticed that on RDMA, NFSv4 operations were using "hardway"
> allocations much more than not. A "hardway" allocation uses GFP_NOFS
> during each RPC to allocate the XDR buffer, instead of using a
> pre-allocated pre-registered buffer for each RPC.
>
> The pre-allocated buffers are 2200 bytes in length.  The requested
> XDR buffer sizes looked like this:
>
>     GETATTR: 3220 bytes
>     LOOKUP:  3612 bytes
>     WRITE:   3256 bytes
>     OPEN:    6344 bytes
>
> But an NFSv4 GETATTR RPC request should be small. It's the reply
> part of GETATTR that can grow large.
>
> call_allocate() passes a single value as the XDR buffer size: the
> sum of call and reply buffers. However, the xprtrdma transport
> allocates its XDR request and reply buffers separately.
>
> xprtrdma needs to know the maximum call size, as guidance for how
> large the outgoing request is going to be and how the NFS payload
> will be marshalled into chunks.
>
> But RDMA XDR reply buffers are pre-posted, fixed-size buffers, not
> allocated by xprt_rdma_allocate().
>
> Because of the sum passed through ->buf_alloc(), xprtrdma's
> ->buf_alloc() always allocates more XDR buffer than it will ever
> use. For NFSv4, it is unnecessarily triggering the slow "hardway"
> path for almost every RPC.
>
> Pass the call and reply buffer size values separately to the
> transport's ->buf_alloc method.  The RDMA transport ->buf_alloc can
> now ignore the reply size, and allocate just what it will use for
> the call buffer.  The socket transport ->buf_alloc can simply add
> them together, as call_allocate() did before.
>
> With this patch, an NFSv4 GETATTR request now allocates a 476 byte
> RDMA XDR buffer. I didn't see a single NFSv4 request that did not
> fit into the transport's pre-allocated XDR buffer.
>
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>  include/linux/sunrpc/sched.h    |    2 +-
>  include/linux/sunrpc/xprt.h     |    3 ++-
>  net/sunrpc/clnt.c               |    4 ++--
>  net/sunrpc/sched.c              |    6 ++++--
>  net/sunrpc/xprtrdma/transport.c |    2 +-
>  net/sunrpc/xprtsock.c           |    3 ++-
>  6 files changed, 12 insertions(+), 8 deletions(-)
>
> diff --git a/include/linux/sunrpc/sched.h b/include/linux/sunrpc/sched.h
> index 1a89599..68fa71d 100644
> --- a/include/linux/sunrpc/sched.h
> +++ b/include/linux/sunrpc/sched.h
> @@ -232,7 +232,7 @@ struct rpc_task *rpc_wake_up_first(struct rpc_wait_queue *,
>  					void *);
>  void		rpc_wake_up_status(struct rpc_wait_queue *, int);
>  void		rpc_delay(struct rpc_task *, unsigned long);
> -void *		rpc_malloc(struct rpc_task *, size_t);
> +void		*rpc_malloc(struct rpc_task *, size_t, size_t);
>  void		rpc_free(void *);
>  int		rpciod_up(void);
>  void		rpciod_down(void);
> diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h
> index fcbfe87..632685c 100644
> --- a/include/linux/sunrpc/xprt.h
> +++ b/include/linux/sunrpc/xprt.h
> @@ -124,7 +124,8 @@ struct rpc_xprt_ops {
>  	void		(*rpcbind)(struct rpc_task *task);
>  	void		(*set_port)(struct rpc_xprt *xprt, unsigned short port);
>  	void		(*connect)(struct rpc_xprt *xprt, struct rpc_task *task);
> -	void *		(*buf_alloc)(struct rpc_task *task, size_t size);
> +	void *		(*buf_alloc)(struct rpc_task *task,
> +					size_t call, size_t reply);
>  	void		(*buf_free)(void *buffer);
>  	int		(*send_request)(struct rpc_task *task);
>  	void		(*set_retrans_timeout)(struct rpc_task *task);
> diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
> index 488ddee..5e817d6 100644
> --- a/net/sunrpc/clnt.c
> +++ b/net/sunrpc/clnt.c
> @@ -1599,8 +1599,8 @@ call_allocate(struct rpc_task *task)
>  	req->rq_rcvsize = RPC_REPHDRSIZE + slack + proc->p_replen;
>  	req->rq_rcvsize <<= 2;
>  
> -	req->rq_buffer = xprt->ops->buf_alloc(task,
> -					req->rq_callsize + req->rq_rcvsize);
> +	req->rq_buffer = xprt->ops->buf_alloc(task, req->rq_callsize,
> +					      req->rq_rcvsize);
>  	if (req->rq_buffer != NULL)
>  		return;
>  
> diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
> index 9358c79..fc4f939 100644
> --- a/net/sunrpc/sched.c
> +++ b/net/sunrpc/sched.c
> @@ -829,7 +829,8 @@ static void rpc_async_schedule(struct work_struct *work)
>  /**
>   * rpc_malloc - allocate an RPC buffer
>   * @task: RPC task that will use this buffer
> - * @size: requested byte size
> + * @call: maximum size of on-the-wire RPC call, in bytes
> + * @reply: maximum size of on-the-wire RPC reply, in bytes
>   *
>   * To prevent rpciod from hanging, this allocator never sleeps,
>   * returning NULL and suppressing warning if the request cannot be serviced
> @@ -843,8 +844,9 @@ static void rpc_async_schedule(struct work_struct *work)
>   * In order to avoid memory starvation triggering more writebacks of
>   * NFS requests, we avoid using GFP_KERNEL.
>   */
> -void *rpc_malloc(struct rpc_task *task, size_t size)
> +void *rpc_malloc(struct rpc_task *task, size_t call, size_t reply)
>  {
> +	size_t size = call + reply;
>  	struct rpc_buffer *buf;
>  	gfp_t gfp = GFP_NOWAIT | __GFP_NOWARN;
>  
> diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
> index 2faac49..6e9d0a7 100644
> --- a/net/sunrpc/xprtrdma/transport.c
> +++ b/net/sunrpc/xprtrdma/transport.c
> @@ -459,7 +459,7 @@ xprt_rdma_connect(struct rpc_xprt *xprt, struct rpc_task *task)
>   * the receive buffer portion when using reply chunks.
>   */
>  static void *
> -xprt_rdma_allocate(struct rpc_task *task, size_t size)
> +xprt_rdma_allocate(struct rpc_task *task, size_t size, size_t replen)

The comment right before this function mentions that send and receive buffers are allocated in the same call.  Can you update this?

Anna

>  {
>  	struct rpc_xprt *xprt = task->tk_rqstp->rq_xprt;
>  	struct rpcrdma_req *req, *nreq;
> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
> index 43cd89e..b4aca48 100644
> --- a/net/sunrpc/xprtsock.c
> +++ b/net/sunrpc/xprtsock.c
> @@ -2423,8 +2423,9 @@ static void xs_tcp_print_stats(struct rpc_xprt *xprt, struct seq_file *seq)
>   * we allocate pages instead doing a kmalloc like rpc_malloc is because we want
>   * to use the server side send routines.
>   */
> -static void *bc_malloc(struct rpc_task *task, size_t size)
> +static void *bc_malloc(struct rpc_task *task, size_t call, size_t reply)
>  {
> +	size_t size = call + reply;
>  	struct page *page;
>  	struct rpc_buffer *buf;
>  
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


  reply	other threads:[~2014-10-20 14:04 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-10-16 19:38 [PATCH v1 00/16] NFS/RDMA patches for 3.19 Chuck Lever
2014-10-16 19:38 ` [PATCH v1 01/16] xprtrdma: Return an errno from rpcrdma_register_external() Chuck Lever
2014-10-16 19:38 ` [PATCH v1 02/16] xprtrdma: Cap req_cqinit Chuck Lever
2014-10-20 13:27   ` Anna Schumaker
2014-10-16 19:38 ` [PATCH v1 03/16] SUNRPC: Pass callsize and recvsize to buf_alloc as separate arguments Chuck Lever
2014-10-20 14:04   ` Anna Schumaker [this message]
2014-10-20 18:21     ` Chuck Lever
2014-10-16 19:38 ` [PATCH v1 04/16] xprtrdma: Re-write rpcrdma_flush_cqs() Chuck Lever
2014-10-16 19:38 ` [PATCH v1 05/16] xprtrdma: unmap all FMRs during transport disconnect Chuck Lever
2014-10-16 19:39 ` [PATCH v1 06/16] xprtrdma: spin CQ completion vectors Chuck Lever
2014-10-16 19:39 ` [PATCH v1 07/16] SUNRPC: serialize iostats updates Chuck Lever
2014-10-16 19:39 ` [PATCH v1 08/16] xprtrdma: Display async errors Chuck Lever
2014-10-16 19:39 ` [PATCH v1 09/16] xprtrdma: Enable pad optimization Chuck Lever
2014-10-16 19:39 ` [PATCH v1 10/16] NFS: Include transport protocol name in UCS client string Chuck Lever
2014-10-16 19:39 ` [PATCH v1 11/16] NFS: Clean up nfs4_init_callback() Chuck Lever
2014-10-16 19:39 ` [PATCH v1 12/16] SUNRPC: Add rpc_xprt_is_bidirectional() Chuck Lever
2014-10-16 19:40 ` [PATCH v1 13/16] NFS: Add sidecar RPC client support Chuck Lever
2014-10-20 17:33   ` Anna Schumaker
2014-10-20 18:09     ` Chuck Lever
2014-10-20 19:40       ` Trond Myklebust
2014-10-20 20:11         ` Chuck Lever
2014-10-20 22:31           ` Trond Myklebust
2014-10-21  1:06             ` Chuck Lever
2014-10-21  7:45               ` Trond Myklebust
2014-10-21 17:11                 ` Chuck Lever
2014-10-22  8:39                   ` Trond Myklebust
2014-10-22 17:20                     ` Chuck Lever
2014-10-22 20:53                       ` Trond Myklebust
2014-10-22 22:38                         ` Chuck Lever
2014-10-23 13:32                   ` J. Bruce Fields
2014-10-23 13:55                     ` Chuck Lever
2014-10-16 19:40 ` [PATCH v1 14/16] NFS: Set BIND_CONN_TO_SESSION arguments in the proc layer Chuck Lever
2014-10-16 19:40 ` [PATCH v1 15/16] NFS: Bind side-car connection to session Chuck Lever
2014-10-16 19:40 ` [PATCH v1 16/16] NFS: Disable SESSION4_BACK_CHAN when a backchannel sidecar is to be used Chuck Lever

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5445167D.9050401@Netapp.com \
    --to=anna.schumaker@netapp.com \
    --cc=chuck.lever@oracle.com \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox