From: Jeff Layton <jlayton@kernel.org>
To: Mike Snitzer <snitzer@kernel.org>
Cc: Chuck Lever <chuck.lever@oracle.com>,
linux-nfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-block@vger.kernel.org, Jens Axboe <axboe@kernel.dk>
Subject: Re: A comparison of the new nfsd iomodes (and an experimental one)
Date: Fri, 27 Mar 2026 07:32:08 -0400 [thread overview]
Message-ID: <70e9c23a97d94a3dad5aa7f03f5a22c0950b00bf.camel@kernel.org> (raw)
In-Reply-To: <acWbrlvt_dUB9X3R@kernel.org>
On Thu, 2026-03-26 at 16:48 -0400, Mike Snitzer wrote:
> On Thu, Mar 26, 2026 at 12:35:15PM -0400, Jeff Layton wrote:
> > On Thu, 2026-03-26 at 11:30 -0400, Chuck Lever wrote:
> > > On 3/26/26 11:23 AM, Jeff Layton wrote:
> > > > I've been doing some benchmarking of the new nfsd iomodes, using
> > > > different fio-based workloads.
> > > >
> > > > The results have been interesting, but one thing that stands out is
> > > > that RWF_DONTCACHE is absolutely terrible for streaming write
> > > > workloads. That prompted me to experiment with a new iomode that added
> > > > some optimizations (DONTCACHE_LAZY).
> > > >
> > > > The results along with Claude's analysis are here:
> > > >
> > > > https://markdownpastebin.com/?id=387375d00b5443b3a2e37d58a062331f
> > > >
> > > > He gets a bit out over his skis on the upstream plan, but tl;dr is that
> > > > DONTCACHE_LAZY (which is DONTCACHE with some optimizations) outperforms
> > > > the other write iomodes.
> > >
> > > The analysis of the write modes seems plausible. I'm interested to hear
> > > what Mike and Jens have to say about that.
>
> Thanks for doing your testing and the summary, but I cannot help but
> feel like your test isn't coming close to realizing the O_DIRECT
> benefits over buffered that were covered in the past, e.g.:
> https://www.youtube.com/watch?v=tpPFDu9Nuuw
>
> Can Claude be made to watch a youtube video, summarize what it learned
> and then adapt its test-plan accordingly? ;)
>
I'm not sure it can. It's a good q though. I'll ask it!
> Your bandwidth for 1MB sequential IO of 793 MB/s for O_DIRECT and
> 4,952 MB/s for buffered and dontcache is considerably less than the 72
> GB/s offered in Jon's testbed. Your testing isn't exposing the
> bottlenecks (contention) of the MM subsystem for buffered IO... not
> yet put my finger on _why_ that is.
That may very well be, but not everyone has a box as large as the one
you and Jon were working with.
> In Jon Flynn's testing he was using a working set of 312.5% of
> available server memory, and the single client test system was using
> fio with multiple threads and sync IO to write to 16 different mounts
> (one per NVMe of the NFS server) with nconnect=16 and RDMA.
>
This test was attempting to simulate a high nconnect count from
multiple clients by using multiple fio processes. I also used the
libnfs backend to fio so I wouldn't need to worry about tuning the
kernel client.
> Raw performance of a single NVMe in Jon's testbed was over 14 GB/s --
> he has the ability to drive 16 NVMe in his single NFS server. So an
> order of magnitude more capable backend storage in Jon's NFS server.
>
Very true. This box only had a single SSD. I can try to find something
with more storage though for another test.
> Big concern is your testing isn't exposing MM bottlenecks of buffered
> IO... given that, its not really providing useful results to compare
> with O_DIRECT.
>
> Putting that aside, yes DONTCACHE as-is really isn't helpful.. your
> lazy variant seems much more useful.
>
Right. I think that's the big takeaway from this. Ignoring claude's
rambling about changing default iomodes in the server, RWF_DONTCACHE
just sucks for heavy writes. There is room for improvement there.
The big question I have is whether fixing RWF_DONTCACHE's writeback
behavior would give better results than what you were seeing with
O_DIRECT. Do you guys still have access to that test rig? I can send
you the patch if you want to test this and see how it does.
> > > One thing I'd like to hear more about is why Claude felt that disabling
> > > splice read was beneficial. My own benchmarking in that area has shown
> > > that splice read is always a win over not using splice.
> > >
> >
> > Good catch. That turns out to be a mistake in Claude's writeup.
> >
> > The test scripts left splice reads enabled for buffered reads, and the
> > results in the analysis reflect that. I (and it) have no idea why it
> > would recommend disabling them, when the testing all left them enabled
> > for buffered reads.
>
> Claude had to have picked up on the mutual exclusion with splice_read
> for both NFSD_IO_DONTCACHE and NFSD_IO_DIRECT io modes. So
> splice_read is implicity disabled when testing NFSD_IO_DONTCACHE
> (which is buffered IO).
Yeah, Claude just got confused on the writeup. The benchmarking itself
was sound, AFAICT. Buffered reads used splice and it was disabled for
the other modes.
--
Jeff Layton <jlayton@kernel.org>
next prev parent reply other threads:[~2026-03-27 11:32 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-26 15:23 A comparison of the new nfsd iomodes (and an experimental one) Jeff Layton
2026-03-26 15:30 ` Chuck Lever
2026-03-26 16:35 ` Jeff Layton
2026-03-26 20:48 ` Mike Snitzer
2026-03-27 11:32 ` Jeff Layton [this message]
2026-03-27 13:19 ` Chuck Lever
2026-03-27 16:57 ` Mike Snitzer
2026-03-28 12:37 ` Jeff Layton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=70e9c23a97d94a3dad5aa7f03f5a22c0950b00bf.camel@kernel.org \
--to=jlayton@kernel.org \
--cc=axboe@kernel.dk \
--cc=chuck.lever@oracle.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=snitzer@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox