linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Devesh Sharma <devesh.sharma@broadcom.com>
To: Chuck Lever <chuck.lever@oracle.com>
Cc: linux-rdma@vger.kernel.org,
	Linux NFS Mailing List <linux-nfs@vger.kernel.org>
Subject: Re: [PATCH v1 1/8] xprtrdma: Segment head and tail XDR buffers on page boundaries
Date: Mon, 15 Feb 2016 19:57:52 +0530	[thread overview]
Message-ID: <CANjDDBi4ejPK21LsN21LcR2d6AVdsgw=G8pwWb1MLnmzDP8-0w@mail.gmail.com> (raw)
In-Reply-To: <20160212210602.5278.57457.stgit@manet.1015granger.net>

Looks good.

On Sat, Feb 13, 2016 at 2:36 AM, Chuck Lever <chuck.lever@oracle.com> wrote:
> A single memory allocation is used for the pair of buffers wherein
> the RPC client builds an RPC call message and decodes its matching
> reply. These buffers are sized based on the maximum possible size
> of the RPC call and reply messages for the operation in progress.
>
> This means that as the call buffer increases in size, the start of
> the reply buffer is pushed farther into the memory allocation.
>
> RPC requests are growing in size. It used to be that both the call
> and reply buffers fit inside a single page.
>
> But these days, thanks to NFSv4 (and especially security labels in
> NFSv4.2) the maximum call and reply sizes are large. NFSv4.0 OPEN,
> for example, now requires a 6KB allocation for a pair of call and
> reply buffers, and NFSv4 LOOKUP is not far behind.
>
> As the maximum size of a call increases, the reply buffer is pushed
> far enough into the buffer's memory allocation that a page boundary
> can appear in the middle of it.
>
> When the maximum possible reply size is larger than the client's
> RDMA receive buffers (currently 1KB), the client has to register a
> Reply chunk for the server to RDMA Write the reply into.
>
> The logic in rpcrdma_convert_iovs() assumes that xdr_buf head and
> tail buffers would always be contained on a single page. It supplies
> just one segment for the head and one for the tail.
>
> FMR, for example, registers up to a page boundary (only a portion of
> the reply buffer in the OPEN case above). But without additional
> segments, it doesn't register the rest of the buffer.
>
> When the server tries to write the OPEN reply, the RDMA Write fails
> with a remote access error since the client registered only part of
> the Reply chunk.
>
> rpcrdma_convert_iovs() must split the XDR buffer into multiple
> segments, each of which are guaranteed not to contain a page
> boundary. That way fmr_op_map is given the proper number of segments
> to register the whole reply buffer.
>
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>  net/sunrpc/xprtrdma/rpc_rdma.c |   42 ++++++++++++++++++++++++++++++----------
>  1 file changed, 32 insertions(+), 10 deletions(-)
>
> diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
> index 0f28f2d..add1f98 100644
> --- a/net/sunrpc/xprtrdma/rpc_rdma.c
> +++ b/net/sunrpc/xprtrdma/rpc_rdma.c
> @@ -132,6 +132,33 @@ rpcrdma_tail_pullup(struct xdr_buf *buf)
>         return tlen;
>  }
>
> +/* Split "vec" on page boundaries into segments. FMR registers pages,
> + * not a byte range. Other modes coalesce these segments into a single
> + * MR when they can.
> + */
> +static int
> +rpcrdma_convert_kvec(struct kvec *vec, struct rpcrdma_mr_seg *seg,
> +                    int n, int nsegs)
> +{
> +       size_t page_offset;
> +       u32 remaining;
> +       char *base;
> +
> +       base = vec->iov_base;
> +       page_offset = offset_in_page(base);
> +       remaining = vec->iov_len;
> +       while (remaining && n < nsegs) {
> +               seg[n].mr_page = NULL;
> +               seg[n].mr_offset = base;
> +               seg[n].mr_len = min_t(u32, PAGE_SIZE - page_offset, remaining);
> +               remaining -= seg[n].mr_len;
> +               base += seg[n].mr_len;
> +               ++n;
> +               page_offset = 0;
> +       }
> +       return n;
> +}
> +
>  /*
>   * Chunk assembly from upper layer xdr_buf.
>   *
> @@ -150,11 +177,10 @@ rpcrdma_convert_iovs(struct xdr_buf *xdrbuf, unsigned int pos,
>         int page_base;
>         struct page **ppages;
>
> -       if (pos == 0 && xdrbuf->head[0].iov_len) {
> -               seg[n].mr_page = NULL;
> -               seg[n].mr_offset = xdrbuf->head[0].iov_base;
> -               seg[n].mr_len = xdrbuf->head[0].iov_len;
> -               ++n;
> +       if (pos == 0) {
> +               n = rpcrdma_convert_kvec(&xdrbuf->head[0], seg, n, nsegs);
> +               if (n == nsegs)
> +                       return -EIO;
>         }
>
>         len = xdrbuf->page_len;
> @@ -192,13 +218,9 @@ rpcrdma_convert_iovs(struct xdr_buf *xdrbuf, unsigned int pos,
>                  * xdr pad bytes, saving the server an RDMA operation. */
>                 if (xdrbuf->tail[0].iov_len < 4 && xprt_rdma_pad_optimize)
>                         return n;
> +               n = rpcrdma_convert_kvec(&xdrbuf->tail[0], seg, n, nsegs);
>                 if (n == nsegs)
> -                       /* Tail remains, but we're out of segments */
>                         return -EIO;
> -               seg[n].mr_page = NULL;
> -               seg[n].mr_offset = xdrbuf->tail[0].iov_base;
> -               seg[n].mr_len = xdrbuf->tail[0].iov_len;
> -               ++n;
>         }
>
>         return n;
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2016-02-15 14:28 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-12 21:05 [PATCH v1 0/8] NFS/RDMA client patches for v4.6 Chuck Lever
2016-02-12 21:06 ` [PATCH v1 1/8] xprtrdma: Segment head and tail XDR buffers on page boundaries Chuck Lever
2016-02-15 14:27   ` Devesh Sharma [this message]
2016-02-12 21:06 ` [PATCH v1 2/8] xprtrdma: Invalidate memory when a signal is caught Chuck Lever
2016-02-15 14:28   ` Devesh Sharma
2016-02-12 21:06 ` [PATCH v1 3/8] rpcrdma: Add RPCRDMA_HDRLEN_ERR Chuck Lever
2016-02-15 14:28   ` Devesh Sharma
2016-02-12 21:06 ` [PATCH v1 4/8] xprtrdma: Properly handle RDMA_ERROR replies Chuck Lever
2016-02-15 14:28   ` Devesh Sharma
2016-02-17 21:19   ` Anna Schumaker
2016-02-17 21:21     ` Chuck Lever
2016-02-17 21:24       ` Anna Schumaker
2016-02-12 21:06 ` [PATCH v1 5/8] xprtrdma: Serialize credit accounting again Chuck Lever
2016-02-15 14:29   ` Devesh Sharma
2016-02-15 15:00     ` Chuck Lever
2016-02-16  5:15       ` Devesh Sharma
2016-02-12 21:06 ` [PATCH v1 6/8] xprtrdma: Use new CQ API for RPC-over-RDMA client receive CQs Chuck Lever
2016-02-15 14:29   ` Devesh Sharma
2016-02-12 21:06 ` [PATCH v1 7/8] xprtrdma: Use an anonymous union in struct rpcrdma_mw Chuck Lever
2016-02-15 14:30   ` Devesh Sharma
2016-02-12 21:07 ` [PATCH v1 8/8] xprtrdma: Use new CQ API for RPC-over-RDMA client send CQs Chuck Lever
2016-02-15 14:31 ` [PATCH v1 0/8] NFS/RDMA client patches for v4.6 Devesh Sharma

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CANjDDBi4ejPK21LsN21LcR2d6AVdsgw=G8pwWb1MLnmzDP8-0w@mail.gmail.com' \
    --to=devesh.sharma@broadcom.com \
    --cc=chuck.lever@oracle.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).