From: "J. Bruce Fields" <bfields@fieldses.org>
To: Chuck Lever <chuck.lever@oracle.com>
Cc: Christoph Hellwig <hch@infradead.org>,
Anna Schumaker <Anna.Schumaker@netapp.com>,
Linux NFS Mailing List <linux-nfs@vger.kernel.org>
Subject: Re: [PATCH v2 2/4] NFSD: Add READ_PLUS support for data segments
Date: Fri, 6 Feb 2015 17:01:20 -0500 [thread overview]
Message-ID: <20150206220120.GI29783@fieldses.org> (raw)
In-Reply-To: <B1A0C74D-725B-43B6-8325-7A70E189CF2A@oracle.com>
On Fri, Feb 06, 2015 at 04:12:51PM -0500, Chuck Lever wrote:
>
> On Feb 6, 2015, at 3:28 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
>
> > On Fri, Feb 06, 2015 at 03:07:08PM -0500, Chuck Lever wrote:
> >>
> >> On Feb 6, 2015, at 2:35 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> >>>>
> >>>> Small replies are sent inline. There is a size maximum for inline
> >>>> messages, however. I guess 5667 section 5 assumes this context, which
> >>>> appears throughout RFC 5666.
> >>>>
> >>>> If an expected reply exceeds the inline size, then a client will
> >>>> set up a reply list for the server. A memory region on the client is
> >>>> registered as a target for RDMA WRITE operations, and the co-ordinates
> >>>> of that region are sent to the server in the RPC call.
> >>>>
> >>>> If the server finds the reply will indeed be larger than the inline
> >>>> maximum, it plants the reply in the client memory region described by
> >>>> the request’s reply list, and repeats the co-ordinates of that region
> >>>> back to the client in the RPC reply.
> >>>>
> >>>> A server may also choose to send a small reply inline, even if the
> >>>> client provided a reply list. In that case, the server does not
> >>>> repeat the reply list in the reply, and the full reply appears
> >>>> inline.
> >>>>
> >>>> Linux registers part of the RPC reply buffer for the reply list. After
> >>>> it is received on the client, the reply payload is copied by the client
> >>>> CPU to its final destination.
> >>>>
> >>>> Inline and reply list are the mechanisms used when the upper layer
> >>>> has some processing to do to the incoming data (eg READDIR). When
> >>>> a request just needs raw data to be simply dropped off in the client’s
> >>>> memory, then the write list is preferred. A write list is basically a
> >>>> zero-copy I/O.
> >>>
> >>> The term "reply list" doesn't appear in either RFC. I believe you mean
> >>> "client-posted write list" in most of the above, except this last
> >>> paragraph, which should have started with "Inline and server-posted read list...” ?
> >>
> >> No, I meant “reply list.” Definitely not read list.
> >>
> >> The terms used in the RFCs and the implementations vary,
> >
> > OK. Would you mind defining the term "reply list" for me? Google's not helping.
>
> Let’s look at section 4.3 of RFC 5666. Each RPC/RDMA header begins
> with this:
>
> struct rdma_msg {
> uint32 rdma_xid; /* Mirrors the RPC header xid */
> uint32 rdma_vers; /* Version of this protocol */
> uint32 rdma_credit; /* Buffers requested/granted */
> rdma_body rdma_body;
> };
>
> rdma_body starts with a uint32 which discriminates a union:
>
> union rdma_body switch (rdma_proc proc) {
> . . .
> case RDMA_NOMSG:
> rpc_rdma_header_nomsg rdma_nomsg;
> . . .
> };
>
> When “proc” == RDMA_NOMSG, rdma_body is made up of three lists:
>
> struct rpc_rdma_header_nomsg {
> struct xdr_read_list *rdma_reads;
> struct xdr_write_list *rdma_writes;
> struct xdr_write_chunk *rdma_reply;
> };
>
> The “reply list” is that last part: rdma_reply, which is a counted
> array of xdr_rdma_segment’s.
>
> Large replies for non-NFS READ operations are sent using RDMA_NOMSG.
> The RPC/RDMA header is sent as the inline portion of the message.
> The RPC reply message (the part we are all familiar with) is planted
> in the memory region described by rdma_reply, it’s not inline.
>
> rdma_reply is a write chunk. The server WRITEs its RPC reply into the
> memory region described by rdma_reply. That description was provided
> by the client in the matching RPC call message.
Thanks! Gah, my apologies, obviously I didn't understand the reference
to section 5.2 before. I think I understand now....
And I'll be interested to see what we come up with for READ_PLUS case.
--b.
next prev parent reply other threads:[~2015-02-06 22:01 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-01-28 20:42 [PATCH v2 0/4] NFSD: Add READ_PLUS support Anna.Schumaker
2015-01-28 20:42 ` [PATCH v2 1/4] NFSD: nfsd4_encode_read() should encode eof and maxcount Anna.Schumaker
2015-02-05 14:11 ` Christoph Hellwig
2015-02-05 16:04 ` Anna Schumaker
2015-01-28 20:42 ` [PATCH v2 2/4] NFSD: Add READ_PLUS support for data segments Anna.Schumaker
2015-02-05 14:13 ` Christoph Hellwig
2015-02-05 16:06 ` Anna Schumaker
2015-02-05 16:23 ` Christoph Hellwig
2015-02-05 16:43 ` Anna Schumaker
2015-02-05 16:48 ` J. Bruce Fields
2015-02-05 16:53 ` Anna Schumaker
2015-02-06 22:00 ` Anna Schumaker
2015-02-11 16:04 ` Anna Schumaker
2015-02-11 16:13 ` Trond Myklebust
2015-02-11 16:22 ` J. Bruce Fields
2015-02-11 16:31 ` Trond Myklebust
2015-02-11 16:42 ` J. Bruce Fields
2015-02-12 12:32 ` Christoph Hellwig
[not found] ` <OF7B254253.7A276767-ON88257DE9.005F1E53-88257DE9.0060F512@us.ibm.com>
2015-02-11 17:47 ` Trond Myklebust
2015-02-11 18:17 ` Tom Haynes
2015-02-11 18:49 ` Trond Myklebust
2015-02-11 19:01 ` Anna Schumaker
2015-02-11 19:22 ` Anna Schumaker
2015-02-12 19:59 ` Anna Schumaker
2015-02-13 13:24 ` Christoph Hellwig
2015-02-13 14:12 ` J. Bruce Fields
2015-02-12 12:29 ` Christoph Hellwig
2015-02-06 11:54 ` Christoph Hellwig
2015-02-06 16:08 ` J. Bruce Fields
2015-02-06 16:21 ` J. Bruce Fields
2015-02-06 16:46 ` Chuck Lever
2015-02-06 17:04 ` Chuck Lever
2015-02-06 17:59 ` J. Bruce Fields
2015-02-06 18:44 ` Chuck Lever
2015-02-06 19:35 ` J. Bruce Fields
2015-02-06 20:07 ` Chuck Lever
2015-02-06 20:28 ` J. Bruce Fields
2015-02-06 21:12 ` Chuck Lever
2015-02-06 22:01 ` J. Bruce Fields [this message]
2015-02-05 16:47 ` J. Bruce Fields
2015-01-28 20:42 ` [PATCH v2 3/4] NFSD: Add READ_PLUS support for hole segments Anna.Schumaker
2015-01-28 20:42 ` [PATCH v2 4/4] NFSD: Add support for encoding multiple segments Anna.Schumaker
2015-01-28 21:38 ` [PATCH v2 0/4] NFSD: Add READ_PLUS support Christoph Hellwig
2015-01-28 21:45 ` Anna Schumaker
2015-01-29 16:49 ` Anna Schumaker
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150206220120.GI29783@fieldses.org \
--to=bfields@fieldses.org \
--cc=Anna.Schumaker@netapp.com \
--cc=chuck.lever@oracle.com \
--cc=hch@infradead.org \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.