public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Sagi Grimberg <sagig@dev.mellanox.co.il>
To: Chuck Lever <chuck.lever@oracle.com>
Cc: linux-rdma@vger.kernel.org,
	Linux NFS Mailing List <linux-nfs@vger.kernel.org>
Subject: Re: [PATCH v1 07/12] xprtrdma: Don't provide a reply chunk when expecting a short reply
Date: Tue, 14 Jul 2015 12:54:39 +0300	[thread overview]
Message-ID: <55A4DC5F.9090403@dev.mellanox.co.il> (raw)
In-Reply-To: <2EB8EA33-9345-4D18-8BE1-39C4EB2658E2@oracle.com>

On 7/12/2015 9:38 PM, Chuck Lever wrote:
> Hi Sagi-
>
>
> On Jul 12, 2015, at 10:58 AM, Sagi Grimberg <sagig@dev.mellanox.co.il> wrote:
>
>> On 7/9/2015 11:42 PM, Chuck Lever wrote:
>>> Currently Linux always offers a reply chunk, even for small replies
>>> (unless a read or write list is needed for the RPC operation).
>>>
>>> A comment in rpcrdma_marshal_req() reads:
>>>
>>>> Currently we try to not actually use read inline.
>>>> Reply chunks have the desirable property that
>>>> they land, packed, directly in the target buffers
>>>> without headers, so they require no fixup. The
>>>> additional RDMA Write op sends the same amount
>>>> of data, streams on-the-wire and adds no overhead
>>>> on receive. Therefore, we request a reply chunk
>>>> for non-writes wherever feasible and efficient.
>>>
>>> This considers only the network bandwidth cost of sending the RPC
>>> reply. For replies which are only a few dozen bytes, this is
>>> typically not a good trade-off.
>>>
>>> If the server chooses to return the reply inline:
>>>
>>>   - The client has registered and invalidated a memory region to
>>>     catch the reply, which is then not used
>>>
>>> If the server chooses to use the reply chunk:
>>>
>>>   - The server sends a few bytes using a heavyweight RDMA WRITE for
>>>     operation. The entire RPC reply is conveyed in two RDMA
>>>     operations (WRITE_ONLY, SEND) instead of one.
>>
>> Pipelined WRITE+SEND operations are hardly an overhead compared to
>> copying chunks of data.
>>
>>>
>>> Note that both the server and client have to prepare or copy the
>>> reply data anyway to construct these replies. There's no benefit to
>>> using an RDMA transfer since the host CPU has to be involved.
>>
>> I think that preparation (posting 1 or 2 WQEs) and copying
>> chunks of data of say 8K-16K might be different.
>
> Two points that are probably not clear from my patch description:
>
> 1. This patch affects only replies (usually much) smaller than the
>     client’s inline threshold (1KB). Anything larger will continue
>     to use RDMA transfer.
>
> 2. These replies are constructed in the RPC buffer by the server,
>     and parsed in the receive buffer by the client. They are not
>     simple data copies on either endpoint.
>
> Think NFS GETATTR: the server is gathering metadata from multiple
> sources, and XDR encoding it in the reply send buffer. The data
> is not copied, it is manipulated before the SEND.
>
> The client then XDR decodes the received stream and scatters the
> decoded results into multiple in-memory data structures.
>
> Because XDR encoding/decoding is involved, there really is no
> benefit to an RDMA transfer for these replies.

I see. Thanks for the clarification.

Reviewed-By: Sagi Grimberg <sagig@mellanox.com>

  reply	other threads:[~2015-07-14  9:54 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-09 20:41 [PATCH v1 00/12] NFS/RDMA client side for Linux 4.3 Chuck Lever
2015-07-09 20:41 ` [PATCH v1 01/12] xprtrdma: Make xprt_setup_rdma() agnostic to family of server address Chuck Lever
2015-07-09 20:41 ` [PATCH v1 02/12] xprtrdma: Raise maximum payload size to one megabyte Chuck Lever
2015-07-10 10:25   ` Devesh Sharma
2015-07-10 19:21   ` Anna Schumaker
2015-07-10 19:33     ` Chuck Lever
2015-07-10 19:41       ` Anna Schumaker
2015-07-12 14:31   ` Sagi Grimberg
2015-07-09 20:42 ` [PATCH v1 03/12] xprtrdma: Increase default credit limit Chuck Lever
2015-07-10 10:45   ` Devesh Sharma
2015-07-10 14:33     ` Chuck Lever
2015-07-10 14:47       ` Devesh Sharma
2015-07-12 14:31   ` Sagi Grimberg
2015-07-09 20:42 ` [PATCH v1 04/12] xprtrdma: Remove last ib_reg_phys_mr() call site Chuck Lever
2015-07-10 10:52   ` Devesh Sharma
2015-07-11 10:34   ` Christoph Hellwig
2015-07-11 18:50     ` Chuck Lever
2015-07-12  7:58       ` Christoph Hellwig
2015-07-12 14:31   ` Sagi Grimberg
2015-07-09 20:42 ` [PATCH v1 05/12] xprtrdma: Account for RPC/RDMA header size when deciding to inline Chuck Lever
2015-07-10 10:55   ` Devesh Sharma
2015-07-10 20:08   ` Anna Schumaker
2015-07-10 20:28     ` Chuck Lever
2015-07-12 14:37   ` Sagi Grimberg
2015-07-12 17:52     ` Chuck Lever
2015-07-09 20:42 ` [PATCH v1 06/12] xprtrdma: Always provide a write list when sending NFS READ Chuck Lever
2015-07-10 11:08   ` Devesh Sharma
2015-07-12 14:42   ` Sagi Grimberg
2015-07-09 20:42 ` [PATCH v1 07/12] xprtrdma: Don't provide a reply chunk when expecting a short reply Chuck Lever
2015-07-12 14:58   ` Sagi Grimberg
2015-07-12 18:38     ` Chuck Lever
2015-07-14  9:54       ` Sagi Grimberg [this message]
2015-07-09 20:42 ` [PATCH v1 08/12] xprtrdma: Fix XDR tail buffer marshalling Chuck Lever
2015-07-09 20:43 ` [PATCH v1 09/12] xprtrdma: Prepare rpcrdma_ep_post() for RDMA_NOMSG calls Chuck Lever
2015-07-10 11:29   ` Devesh Sharma
2015-07-10 12:58     ` Tom Talpey
2015-07-10 14:11       ` Devesh Sharma
2015-07-10 14:53         ` Chuck Lever
2015-07-10 22:44           ` Jason Gunthorpe
2015-07-10 20:43   ` Anna Schumaker
2015-07-10 20:52     ` Chuck Lever
2015-07-09 20:43 ` [PATCH v1 10/12] xprtrdma: Fix large NFS SYMLINK calls Chuck Lever
2015-07-14 16:01   ` Anna Schumaker
2015-07-14 19:09     ` Chuck Lever
2015-07-09 20:43 ` [PATCH v1 11/12] xprtrdma: Clean up xprt_rdma_print_stats() Chuck Lever
2015-07-09 20:43 ` [PATCH v1 12/12] xprtrdma: Count RDMA_NOMSG type calls Chuck Lever

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55A4DC5F.9090403@dev.mellanox.co.il \
    --to=sagig@dev.mellanox.co.il \
    --cc=chuck.lever@oracle.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox