From: "J. Bruce Fields" <bfields@fieldses.org>
To: Chuck Lever <chuck.lever@oracle.com>
Cc: linux-rdma@vger.kernel.org, linux-nfs@vger.kernel.org
Subject: Re: [PATCH v1 0/4] NFS/RDMA server patches for v4.19
Date: Wed, 1 Aug 2018 10:36:22 -0400 [thread overview]
Message-ID: <20180801143622.GF16651@fieldses.org> (raw)
In-Reply-To: <20180727142007.21878.77494.stgit@klimt.1015granger.net>
On Fri, Jul 27, 2018 at 11:18:48AM -0400, Chuck Lever wrote:
> This short series includes clean-ups related to performance-related
> work that you and I have discussed in the past.
Thanks, applying.
> Let me give an
> update on the progress of that work as context for these patches.
>
> We had discussed moving the generation of RDMA Read requests from
> ->recvfrom up into the NFSD proc functions that handle WRITE and
> SYMLINK operations. There were two reasons for this change:
>
> 1. To enable the upper layer to choose the pages that act as the
> RDMA Read sink buffer, rather than always using anonymous pages
> for this purpose
>
> 2. To reduce the average latency of ->recvfrom calls on RPC/RDMA,
> which are serialized per transport connection
>
> I was able to successfully prototype this change. The scope of this
> prototype was limited to exploring how to move the RDMA Read code.
> I have not yet tried to implement a per-FH page selection mechanism.
>
> There was no measurable performance impact of this change. The good
> news is that this confirms that the RDMA Read code can be moved
> upstairs without negative performance consequences. The not-so-good
> news:
>
> - Serialization of ->recvfrom might not be the problem that I
> predicted.
>
> - I don't have a macro benchmark that mixes small NFS requests with
> NFS requests with Read chunks in a way that can assess the
> serialization issue.
>
> - The most significant current bottleneck for NFS WRITE performance
> is on the Linux client, which obscures performance improvements in
> the server-side NFS WRITE path. The bottleneck is generic, not
> related to the use of pNFS or the choice of transport type.
Would it be practical to test with an artificial workload that takes the
client out of the test?
But if there are client issues then they need to be fixed anyway, so
the better use of time may be fixing those first, and then we get to
test the server with more realistic workloads....
> To complete my prototype, I disabled the server's DRC. Going forward
> with this work will require some thought about how to deal with non-
> idempotent requests with Read chunks. Some possibilities:
>
> - For RPC Calls with Read chunks, don't include the payload in the
> checksum. This could be done by providing a per-transport checksum
> callout that would manage the details.
I believe the checksum is there to prevent rpc's being incorrectly
treated as replays on xid wraparound. If we're skipping it in the case
of writes (for example), then we may be slightly increasing the chance
of data coruption in some cases. I don't know if it's significant.
> - Support late RDMA Reads for session-based versions of NFS, but not
> for earlier versions of NFS which utilize the legacy DRC.
To me that sounds like it might be simplest to start off with?
> - Adopt an entirely different DRC hashing mechanism.
I guess we could delay the hashing somehow too.
But I'd rather not invest a lot of time trying to make NFSv2/v3 better,
the priority for older protocols is just to avoid regressions.
--b.
>
> ---
>
> Chuck Lever (4):
> svcrdma: Avoid releasing a page in svc_xprt_release()
> svcrdma: Clean up Read chunk path
> NFSD: Refactor the generic write vector fill helper
> NFSD: Handle full-length symlinks
>
>
> fs/nfsd/nfs3proc.c | 5 ++
> fs/nfsd/nfs4proc.c | 23 ++-------
> fs/nfsd/nfsproc.c | 5 ++
> include/linux/sunrpc/svc.h | 4 +-
> net/sunrpc/svc.c | 78 ++++++++++++-------------------
> net/sunrpc/xprtrdma/svc_rdma_recvfrom.c | 9 ++--
> net/sunrpc/xprtrdma/svc_rdma_rw.c | 32 +++++--------
> net/sunrpc/xprtrdma/svc_rdma_sendto.c | 4 +-
> 8 files changed, 66 insertions(+), 94 deletions(-)
>
> --
> Chuck Lever
next prev parent reply other threads:[~2018-08-01 16:22 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-07-27 15:18 [PATCH v1 0/4] NFS/RDMA server patches for v4.19 Chuck Lever
2018-07-27 15:18 ` [PATCH v1 1/4] svcrdma: Avoid releasing a page in svc_xprt_release() Chuck Lever
2018-07-27 15:18 ` [PATCH v1 2/4] svcrdma: Clean up Read chunk path Chuck Lever
2018-07-27 15:19 ` [PATCH v1 3/4] NFSD: Refactor the generic write vector fill helper Chuck Lever
2018-11-15 16:32 ` J. Bruce Fields
2018-11-16 0:19 ` Chuck Lever
2018-11-19 19:54 ` Bruce Fields
2018-11-19 19:58 ` Chuck Lever
2018-11-19 19:59 ` Bruce Fields
2018-07-27 15:19 ` [PATCH v1 4/4] NFSD: Handle full-length symlinks Chuck Lever
2018-08-01 14:14 ` J. Bruce Fields
2018-08-01 14:16 ` Chuck Lever
2018-08-01 14:36 ` J. Bruce Fields [this message]
2018-08-01 15:14 ` [PATCH v1 0/4] NFS/RDMA server patches for v4.19 Chuck Lever
2018-08-01 15:20 ` Bruce Fields
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180801143622.GF16651@fieldses.org \
--to=bfields@fieldses.org \
--cc=chuck.lever@oracle.com \
--cc=linux-nfs@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).