From: Sagi Grimberg <sagig@dev.mellanox.co.il>
To: Chuck Lever <chuck.lever@oracle.com>
Cc: linux-rdma@vger.kernel.org,
Linux NFS Mailing List <linux-nfs@vger.kernel.org>
Subject: Re: [PATCH v1 07/12] xprtrdma: Don't provide a reply chunk when expecting a short reply
Date: Tue, 14 Jul 2015 12:54:39 +0300 [thread overview]
Message-ID: <55A4DC5F.9090403@dev.mellanox.co.il> (raw)
In-Reply-To: <2EB8EA33-9345-4D18-8BE1-39C4EB2658E2@oracle.com>
On 7/12/2015 9:38 PM, Chuck Lever wrote:
> Hi Sagi-
>
>
> On Jul 12, 2015, at 10:58 AM, Sagi Grimberg <sagig@dev.mellanox.co.il> wrote:
>
>> On 7/9/2015 11:42 PM, Chuck Lever wrote:
>>> Currently Linux always offers a reply chunk, even for small replies
>>> (unless a read or write list is needed for the RPC operation).
>>>
>>> A comment in rpcrdma_marshal_req() reads:
>>>
>>>> Currently we try to not actually use read inline.
>>>> Reply chunks have the desirable property that
>>>> they land, packed, directly in the target buffers
>>>> without headers, so they require no fixup. The
>>>> additional RDMA Write op sends the same amount
>>>> of data, streams on-the-wire and adds no overhead
>>>> on receive. Therefore, we request a reply chunk
>>>> for non-writes wherever feasible and efficient.
>>>
>>> This considers only the network bandwidth cost of sending the RPC
>>> reply. For replies which are only a few dozen bytes, this is
>>> typically not a good trade-off.
>>>
>>> If the server chooses to return the reply inline:
>>>
>>> - The client has registered and invalidated a memory region to
>>> catch the reply, which is then not used
>>>
>>> If the server chooses to use the reply chunk:
>>>
>>> - The server sends a few bytes using a heavyweight RDMA WRITE for
>>> operation. The entire RPC reply is conveyed in two RDMA
>>> operations (WRITE_ONLY, SEND) instead of one.
>>
>> Pipelined WRITE+SEND operations are hardly an overhead compared to
>> copying chunks of data.
>>
>>>
>>> Note that both the server and client have to prepare or copy the
>>> reply data anyway to construct these replies. There's no benefit to
>>> using an RDMA transfer since the host CPU has to be involved.
>>
>> I think that preparation (posting 1 or 2 WQEs) and copying
>> chunks of data of say 8K-16K might be different.
>
> Two points that are probably not clear from my patch description:
>
> 1. This patch affects only replies (usually much) smaller than the
> client’s inline threshold (1KB). Anything larger will continue
> to use RDMA transfer.
>
> 2. These replies are constructed in the RPC buffer by the server,
> and parsed in the receive buffer by the client. They are not
> simple data copies on either endpoint.
>
> Think NFS GETATTR: the server is gathering metadata from multiple
> sources, and XDR encoding it in the reply send buffer. The data
> is not copied, it is manipulated before the SEND.
>
> The client then XDR decodes the received stream and scatters the
> decoded results into multiple in-memory data structures.
>
> Because XDR encoding/decoding is involved, there really is no
> benefit to an RDMA transfer for these replies.
I see. Thanks for the clarification.
Reviewed-By: Sagi Grimberg <sagig@mellanox.com>
WARNING: multiple messages have this Message-ID (diff)
From: Sagi Grimberg <sagig-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
To: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Linux NFS Mailing List
<linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: Re: [PATCH v1 07/12] xprtrdma: Don't provide a reply chunk when expecting a short reply
Date: Tue, 14 Jul 2015 12:54:39 +0300 [thread overview]
Message-ID: <55A4DC5F.9090403@dev.mellanox.co.il> (raw)
In-Reply-To: <2EB8EA33-9345-4D18-8BE1-39C4EB2658E2-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
On 7/12/2015 9:38 PM, Chuck Lever wrote:
> Hi Sagi-
>
>
> On Jul 12, 2015, at 10:58 AM, Sagi Grimberg <sagig-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> wrote:
>
>> On 7/9/2015 11:42 PM, Chuck Lever wrote:
>>> Currently Linux always offers a reply chunk, even for small replies
>>> (unless a read or write list is needed for the RPC operation).
>>>
>>> A comment in rpcrdma_marshal_req() reads:
>>>
>>>> Currently we try to not actually use read inline.
>>>> Reply chunks have the desirable property that
>>>> they land, packed, directly in the target buffers
>>>> without headers, so they require no fixup. The
>>>> additional RDMA Write op sends the same amount
>>>> of data, streams on-the-wire and adds no overhead
>>>> on receive. Therefore, we request a reply chunk
>>>> for non-writes wherever feasible and efficient.
>>>
>>> This considers only the network bandwidth cost of sending the RPC
>>> reply. For replies which are only a few dozen bytes, this is
>>> typically not a good trade-off.
>>>
>>> If the server chooses to return the reply inline:
>>>
>>> - The client has registered and invalidated a memory region to
>>> catch the reply, which is then not used
>>>
>>> If the server chooses to use the reply chunk:
>>>
>>> - The server sends a few bytes using a heavyweight RDMA WRITE for
>>> operation. The entire RPC reply is conveyed in two RDMA
>>> operations (WRITE_ONLY, SEND) instead of one.
>>
>> Pipelined WRITE+SEND operations are hardly an overhead compared to
>> copying chunks of data.
>>
>>>
>>> Note that both the server and client have to prepare or copy the
>>> reply data anyway to construct these replies. There's no benefit to
>>> using an RDMA transfer since the host CPU has to be involved.
>>
>> I think that preparation (posting 1 or 2 WQEs) and copying
>> chunks of data of say 8K-16K might be different.
>
> Two points that are probably not clear from my patch description:
>
> 1. This patch affects only replies (usually much) smaller than the
> client’s inline threshold (1KB). Anything larger will continue
> to use RDMA transfer.
>
> 2. These replies are constructed in the RPC buffer by the server,
> and parsed in the receive buffer by the client. They are not
> simple data copies on either endpoint.
>
> Think NFS GETATTR: the server is gathering metadata from multiple
> sources, and XDR encoding it in the reply send buffer. The data
> is not copied, it is manipulated before the SEND.
>
> The client then XDR decodes the received stream and scatters the
> decoded results into multiple in-memory data structures.
>
> Because XDR encoding/decoding is involved, there really is no
> benefit to an RDMA transfer for these replies.
I see. Thanks for the clarification.
Reviewed-By: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2015-07-14 9:54 UTC|newest]
Thread overview: 92+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-07-09 20:41 [PATCH v1 00/12] NFS/RDMA client side for Linux 4.3 Chuck Lever
2015-07-09 20:41 ` Chuck Lever
2015-07-09 20:41 ` [PATCH v1 01/12] xprtrdma: Make xprt_setup_rdma() agnostic to family of server address Chuck Lever
2015-07-09 20:41 ` Chuck Lever
2015-07-09 20:41 ` [PATCH v1 02/12] xprtrdma: Raise maximum payload size to one megabyte Chuck Lever
2015-07-09 20:41 ` Chuck Lever
2015-07-10 10:25 ` Devesh Sharma
2015-07-10 10:25 ` Devesh Sharma
2015-07-10 19:21 ` Anna Schumaker
2015-07-10 19:21 ` Anna Schumaker
2015-07-10 19:33 ` Chuck Lever
2015-07-10 19:33 ` Chuck Lever
2015-07-10 19:41 ` Anna Schumaker
2015-07-10 19:41 ` Anna Schumaker
2015-07-12 14:31 ` Sagi Grimberg
2015-07-12 14:31 ` Sagi Grimberg
2015-07-09 20:42 ` [PATCH v1 03/12] xprtrdma: Increase default credit limit Chuck Lever
2015-07-09 20:42 ` Chuck Lever
2015-07-10 10:45 ` Devesh Sharma
2015-07-10 10:45 ` Devesh Sharma
2015-07-10 14:33 ` Chuck Lever
2015-07-10 14:33 ` Chuck Lever
2015-07-10 14:47 ` Devesh Sharma
2015-07-10 14:47 ` Devesh Sharma
2015-07-12 14:31 ` Sagi Grimberg
2015-07-12 14:31 ` Sagi Grimberg
2015-07-09 20:42 ` [PATCH v1 04/12] xprtrdma: Remove last ib_reg_phys_mr() call site Chuck Lever
2015-07-09 20:42 ` Chuck Lever
2015-07-10 10:52 ` Devesh Sharma
2015-07-10 10:52 ` Devesh Sharma
2015-07-11 10:34 ` Christoph Hellwig
2015-07-11 10:34 ` Christoph Hellwig
2015-07-11 18:50 ` Chuck Lever
2015-07-11 18:50 ` Chuck Lever
2015-07-12 7:58 ` Christoph Hellwig
2015-07-12 7:58 ` Christoph Hellwig
2015-07-12 14:31 ` Sagi Grimberg
2015-07-12 14:31 ` Sagi Grimberg
2015-07-09 20:42 ` [PATCH v1 05/12] xprtrdma: Account for RPC/RDMA header size when deciding to inline Chuck Lever
2015-07-09 20:42 ` Chuck Lever
2015-07-10 10:55 ` Devesh Sharma
2015-07-10 10:55 ` Devesh Sharma
2015-07-10 20:08 ` Anna Schumaker
2015-07-10 20:08 ` Anna Schumaker
2015-07-10 20:28 ` Chuck Lever
2015-07-10 20:28 ` Chuck Lever
2015-07-12 14:37 ` Sagi Grimberg
2015-07-12 14:37 ` Sagi Grimberg
2015-07-12 17:52 ` Chuck Lever
2015-07-12 17:52 ` Chuck Lever
2015-07-09 20:42 ` [PATCH v1 06/12] xprtrdma: Always provide a write list when sending NFS READ Chuck Lever
2015-07-09 20:42 ` Chuck Lever
2015-07-10 11:08 ` Devesh Sharma
2015-07-10 11:08 ` Devesh Sharma
2015-07-12 14:42 ` Sagi Grimberg
2015-07-12 14:42 ` Sagi Grimberg
2015-07-09 20:42 ` [PATCH v1 07/12] xprtrdma: Don't provide a reply chunk when expecting a short reply Chuck Lever
2015-07-09 20:42 ` Chuck Lever
2015-07-12 14:58 ` Sagi Grimberg
2015-07-12 14:58 ` Sagi Grimberg
2015-07-12 18:38 ` Chuck Lever
2015-07-12 18:38 ` Chuck Lever
2015-07-14 9:54 ` Sagi Grimberg [this message]
2015-07-14 9:54 ` Sagi Grimberg
2015-07-09 20:42 ` [PATCH v1 08/12] xprtrdma: Fix XDR tail buffer marshalling Chuck Lever
2015-07-09 20:42 ` Chuck Lever
2015-07-09 20:43 ` [PATCH v1 09/12] xprtrdma: Prepare rpcrdma_ep_post() for RDMA_NOMSG calls Chuck Lever
2015-07-09 20:43 ` Chuck Lever
2015-07-10 11:29 ` Devesh Sharma
2015-07-10 11:29 ` Devesh Sharma
2015-07-10 12:58 ` Tom Talpey
2015-07-10 12:58 ` Tom Talpey
2015-07-10 14:11 ` Devesh Sharma
2015-07-10 14:11 ` Devesh Sharma
2015-07-10 14:53 ` Chuck Lever
2015-07-10 14:53 ` Chuck Lever
2015-07-10 22:44 ` Jason Gunthorpe
2015-07-10 22:44 ` Jason Gunthorpe
2015-07-10 20:43 ` Anna Schumaker
2015-07-10 20:43 ` Anna Schumaker
2015-07-10 20:52 ` Chuck Lever
2015-07-10 20:52 ` Chuck Lever
2015-07-09 20:43 ` [PATCH v1 10/12] xprtrdma: Fix large NFS SYMLINK calls Chuck Lever
2015-07-09 20:43 ` Chuck Lever
2015-07-14 16:01 ` Anna Schumaker
2015-07-14 16:01 ` Anna Schumaker
2015-07-14 19:09 ` Chuck Lever
2015-07-14 19:09 ` Chuck Lever
2015-07-09 20:43 ` [PATCH v1 11/12] xprtrdma: Clean up xprt_rdma_print_stats() Chuck Lever
2015-07-09 20:43 ` Chuck Lever
2015-07-09 20:43 ` [PATCH v1 12/12] xprtrdma: Count RDMA_NOMSG type calls Chuck Lever
2015-07-09 20:43 ` Chuck Lever
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55A4DC5F.9090403@dev.mellanox.co.il \
--to=sagig@dev.mellanox.co.il \
--cc=chuck.lever@oracle.com \
--cc=linux-nfs@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.