All of lore.kernel.org
 help / color / mirror / Atom feed
* performance r nfsd with RWF_DONTCACHE and larger wsizes
@ 2025-05-06 17:40 Jeff Layton
  2025-05-06 18:16 ` Chuck Lever
  2025-05-06 22:31 ` Dave Chinner
  0 siblings, 2 replies; 12+ messages in thread
From: Jeff Layton @ 2025-05-06 17:40 UTC (permalink / raw)
  To: linux-fsdevel, linux-nfs
  Cc: Chuck Lever, Mike Snitzer, Trond Myklebust, Jens Axboe,
	Chris Mason, Anna Schumaker

FYI I decided to try and get some numbers with Mike's RWF_DONTCACHE
patches for nfsd [1]. Those add a module param that make all reads and
writes use RWF_DONTCACHE.

I had one host that was running knfsd with an XFS export, and a second
that was acting as NFS client. Both machines have tons of memory, so
pagecache utilization is irrelevant for this test.

I tested sequential writes using the fio-seq_write.fio test, both with
and without the module param enabled.

These numbers are from one run each, but they were pretty stable over
several runs:

# fio /usr/share/doc/fio/examples/fio-seq-write.fio

wsize=1M:

Normal:      WRITE: bw=1034MiB/s (1084MB/s), 1034MiB/s-1034MiB/s (1084MB/s-1084MB/s), io=910GiB (977GB), run=901326-901326msec
DONTCACHE:   WRITE: bw=649MiB/s (681MB/s), 649MiB/s-649MiB/s (681MB/s-681MB/s), io=571GiB (613GB), run=900001-900001msec

DONTCACHE with a 1M wsize vs. recent (v6.14-ish) knfsd was about 30%
slower. Memory consumption was down, but these boxes have oodles of
memory, so I didn't notice much change there.

Chris suggested that the write sizes were too small in this test, so I
grabbed Chuck's patches to increase the max RPC payload size [2] to 4M,
and patched the client to allow a wsize that big:

wsize=4M:

Normal:       WRITE: bw=1053MiB/s (1104MB/s), 1053MiB/s-1053MiB/s (1104MB/s-1104MB/s), io=930GiB (999GB), run=904526-904526msec
DONTCACHE:    WRITE: bw=1191MiB/s (1249MB/s), 1191MiB/s-1191MiB/s (1249MB/s-1249MB/s), io=1050GiB (1127GB), run=902781-902781msec

Not much change with normal buffered I/O here, but DONTCACHE is faster
with a 4M wsize. My suspicion (unconfirmed) is that the dropbehind flag
ends up causing partially-written large folios in the pagecache to get
written back too early, and that slows down later writes to the same
folios.

I wonder if we need some heuristic that makes generic_write_sync() only
kick off writeback immediately if the whole folio is dirty so we have
more time to gather writes before kicking off writeback?

This might also be a good reason to think about a larger rsize/wsize
limit in the client.

I'd like to also test reads with this flag, but I'm currently getting
back that EOPNOTSUPP error when I try to test them.

[1]: https://lore.kernel.org/linux-nfs/20250220171205.12092-1-
snitzer@kernel.org/
[2]: https://lore.kernel.org/linux-nfs/20250428193702.5186-15-
cel@kernel.org/
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2025-05-08  2:05 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-06 17:40 performance r nfsd with RWF_DONTCACHE and larger wsizes Jeff Layton
2025-05-06 18:16 ` Chuck Lever
2025-05-06 18:30   ` Jeff Layton
2025-05-06 22:31 ` Dave Chinner
2025-05-07  0:06   ` Jeff Layton
2025-05-07  2:50     ` Dave Chinner
2025-05-07 13:43       ` Chuck Lever
2025-05-08  1:13         ` Dave Chinner
2025-05-07 21:50       ` Mike Snitzer
2025-05-08  0:09         ` Jeff Layton
2025-05-08  2:05           ` Dave Chinner
2025-05-08  1:50         ` Dave Chinner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.